Skip to content
Business & governance

AI Agent Security Risks — and How to Mitigate Them

Orange ITS — AI engineering team 7 min read

Your new AI agent can read emails, call APIs, update CRM records, and send messages on behalf of your company. That is exactly what makes it useful — and exactly why its security posture deserves a harder look than a standard SaaS integration.

Classic application security was designed for systems that respond to authenticated users. Agentic AI is different: the agent decides what to do, which tools to invoke, and in what order, based on natural-language inputs it receives from users, external data, and even other agents. That autonomy introduces an attack surface most IT teams have not yet mapped.

This article covers the real risks in production agentic systems — prompt injection, tool misuse, and data exfiltration — and the practical mitigations available to IT leaders today.


Why AI Agent Security Needs Its Own Framework

Traditional threat modeling looks at authentication, authorisation, and data in transit. Those controls still matter. But agentic systems add three properties that create new exposure:

  • They consume untrusted content as instructions. An agent summarising a PDF, processing an inbound email, or reading a web page is executing against content it did not author. That content can carry adversarial instructions.
  • They hold credentials and can take actions. An agent connected to your calendar, your ERP, or your file store can read, write, and delete — not just look.
  • Their reasoning is opaque. Unlike a deterministic script, an agent’s path from input to action is not always predictable, making anomaly detection harder.

These three properties combine to create attack vectors that have no direct equivalent in conventional software.


The Three Core Attack Vectors in Agentic AI Security

1. Prompt Injection — the Most Documented AI Agent Attack Vector

Prompt injection is the most documented risk in deployed agentic systems. It works like this: an attacker embeds adversarial instructions inside content the agent will process — a document, a customer message, a web page. The agent, unable to distinguish the embedded instruction from a legitimate instruction, executes it.

Direct injection targets the system prompt or user interface directly. An employee asking an internal HR agent a malicious question, hoping to extract another employee’s salary data, is a direct injection attempt.

Indirect injection is subtler and harder to defend. A customer sends a support ticket containing hidden instructions — perhaps white text on white background in a document attachment — telling the agent to forward the conversation summary to an external address. The agent reads the document as part of its workflow and, without proper guardrails, acts on the embedded command.

Indirect injection attacks have been demonstrated against production systems connected to email, browsers, and document stores. Notable public disclosures include EchoLeak (CVE-2025-32711, CVSS 9.3) — an email-borne indirect injection patched in Microsoft 365 Copilot — Slack AI’s August 2024 private-channel data-leakage vulnerability, and GitHub Copilot’s CVE-2025-53773. The foundational academic taxonomy by Greshake et al. (2023) demonstrated attacks across email, browsers, and document stores.

Practical mitigations this quarter:

  • Apply strict input/output validation at every tool boundary — treat data the agent reads as untrusted, separate from instructions the agent receives.
  • Use separate system prompts that explicitly constrain the agent’s permitted actions; keep the instruction surface as narrow as the use case requires.
  • Implement output filtering to catch unexpected data patterns (e.g., email addresses or API keys appearing in agent responses where they shouldn’t).

2. Tool Misuse — When Scope Creep Becomes a Vulnerability

Agents gain their power from tools: the ability to call APIs, run queries, write files, send messages. That power needs explicit scope boundaries. Without them, a small prompt manipulation can lead an agent to take actions that were never intended.

Consider an agent built to answer internal IT helpdesk queries. If it also holds write access to the ticketing system, user directory, and email relay — because that was convenient during development — a crafted input could instruct it to create admin accounts, modify permissions, or exfiltrate ticket data. The agent is not “hacked” in the traditional sense; it is simply given an instruction it has no reason to refuse.

This is less a model problem than a design problem. Most tool-misuse vulnerabilities come down to over-privileged agents.

Practical mitigations:

  • Apply least-privilege by default at the tool level. An agent that reads data should rarely need write access; one that writes should almost never have delete permissions.
  • Define explicit allowed-action lists in your agent’s system configuration rather than relying on the model’s judgment to self-restrict.
  • Log every tool call with its full context — the input that triggered it, the tool invoked, and the output returned. You cannot investigate what you cannot trace.
  • For high-stakes tools (financial writes, user management, external communications), require a human-in-the-loop confirmation step before execution. This is not bureaucracy; it is the difference between a near-miss and an incident.

3. Data Exfiltration — the Quiet Risk in Retrieval-Augmented Agents

Agents that have access to internal knowledge bases, databases, or document stores introduce a data exfiltration risk that is different from a database breach. There is no single point of exploit — the agent itself becomes the exfiltration path.

An attacker who can influence the agent’s inputs (via injection, social engineering, or a compromised user account) can potentially direct it to retrieve and relay sensitive content. Because the agent is an authorised system, the retrieval does not look anomalous to the data store itself.

A secondary exfiltration risk comes from training and logging pipelines. If conversation logs, retrieved documents, or tool outputs are sent to external services for monitoring, fine-tuning, or analytics without proper data classification, sensitive content can leak through a completely different channel than the one you’re watching.

Practical mitigations:

  • Scope retrieval permissions by role: an agent serving sales should not query HR or finance documents, even if the underlying store holds all of them.
  • Apply data classification at the document or record level, not just at the system level. Tag content as internal, confidential, or restricted — and enforce those tags in the agent’s retrieval layer.
  • Audit what leaves your environment. Any logs or conversation data sent to a third-party service (LLM provider, monitoring platform) should go through the same data-handling review you apply to any SaaS vendor.

A Practical Security Checklist for Deployed Agents

Before signing off on any agentic system for production use, an IT leader should be able to answer yes to each of these:

  • Does each agent operate with least-privilege tool access — only what it needs for its defined task?
  • Are system prompts explicit about what the agent is and is not permitted to do?
  • Is all content the agent reads (documents, emails, web pages) treated as untrusted input, with output validation applied?
  • Is every tool call logged with sufficient context to reconstruct what happened and why?
  • Are there human-approval gates for irreversible or high-impact actions?
  • Has the data flowing through agent logs and monitoring pipelines been reviewed for sensitive content?
  • Is there a process to review and update permissions as the agent’s task scope changes?

None of these requires a new security product. Most require deliberate design decisions at the architecture and deployment stage — which is the right time to address them.


Who This Affects Most

Security risks scale with access. An agent with read-only access to a single FAQ database carries minimal risk. An agent connected to your CRM, finance system, email, and customer data — as many production agents are — carries material risk if deployed without the controls above.

The organisations most exposed are typically those who moved quickly from prototype to production, treating the agent as a standalone application rather than as a privileged system actor. The functionality worked; the security posture was not revisited. That gap is worth closing before the agent fleet grows.

For a fuller view of how governance, testing, and security fit together across a deployed agent programme, see our guides on AI agent governance for SMEs and testing AI agents with evals. If you are evaluating whether your current setup introduces GDPR exposure through agent data handling, AI agents and GDPR covers that ground specifically.

Understanding why agentic systems are architecturally different from simple automation also helps frame the security requirements — agentic workflows explained is a good starting point if that context is useful.


How Orange ITS Approaches Agentic AI Security

At Orange ITS, security and governance are built into the architecture from the first design session — not retrofitted after go-live. When we scope an agent project with a client, we define tool permissions, data access boundaries, logging requirements, and human-approval thresholds before writing a line of code.

This is part of a broader engineering discipline we apply across AI agent development: the goal is agents that work reliably in production, not agents that work in demos. For Swiss and European clients, that also means addressing GDPR, the nFADP, and sector-specific AI Act obligations as first-class constraints.

If you have an agent in production — or are about to deploy one — and want a direct assessment of your current security posture, book a 30-minute call with our team. We will walk through your architecture, identify the specific gaps, and give you a prioritised list of mitigations you can act on. Get in touch via our contact page.

Insights

Put these ideas to work

A 30-minute call is enough to find out whether an AI agent fits your workflow — and what it would return.