Security researchers disclosed details of a new attack method called Reprompt that allows malicious actors to exfiltrate sensitive data from AI chatbots like Microsoft Copilot with just one click on a legitimate link, bypassing corporate security controls without requiring plugins or additional user interaction with the assistant. Following responsible disclosure, the flaw was patched by Microsoft and does not affect corporate customers using Microsoft 365 Copilot, according to the company.
The attack exploits Copilot's URL parameter "q" to inject instructions directly from a link, for example, "copilot.microsoft[.]com/?q=Hello", turning the simple opening of that address into a trigger for executing hidden commands. Next, the attacker instructs Copilot to bypass protection guardrails by asking it to repeat each action twice, taking advantage of the fact that data leakage safeguards only apply to the initial request. Finally, the attack establishes a continuous chain of requests, in which Copilot follows dynamic instructions provided by an attacker-controlled server, maintaining active exfiltration even after the chat is closed.
In a hypothetical scenario, the attacker convinces the victim to click on a legitimate Copilot link sent via email, which initiates an automated sequence in which the service executes the hidden prompts in the "q" parameter and begins to "reprompt" the chatbot to fetch and share more information. These commands may include requests such as "summarize all files accessed by the user today," "where does the user live?," or "what vacations do they have planned?," making it impossible to identify which data is being exfiltrated just by inspecting the initial prompt, since the actual instructions follow in subsequent requests from the server. In this way, Reprompt creates an invisible data exfiltration channel, turning Copilot into a leakage vector without any manual prompt input by the user.
Like other attacks against language models, the root cause of Reprompt lies in the AI system's inability to differentiate between instructions entered directly by the user and those embedded in received requests, opening the door to indirect prompt injections when interpreting untrusted data. Researchers warn that there is no limit to the type or volume of data that can be exfiltrated, as the attacking server can adapt its requests based on previous responses, for example, adjusting questions to the victim's industry to obtain even more sensitive information. The findings add to a series of recent adversarial techniques — such as ZombieAgent, Lies-in-the-Loop, GeminiJack, CellShock, vulnerabilities in other corporate AI assistants, and exfiltration flaws across multiple platforms — reinforcing that prompt injection attacks remain a persistent risk and require defense in depth, limitation of privileges for sensitive tools, and rigorous monitoring of AI agents with access to critical corporate data.
This post was translated and summarized from its original version using AI, with human review.
With information from The Hacker News.