How Secure Are Your AI Agents?
An AI agent inside your organization right now is doing something consequential. It may be querying a database, updating a customer record, approving a workflow, or calling an external API. It is making decisions and taking actions — in your name, on your systems, under your regulatory obligations.
Now answer this question honestly: Can you even detect what it is doing? And if something goes wrong, can you prove what it did, why it did it, and who authorized it to a regulatory authority?
For the vast majority of enterprises deploying AI agents today, the answer is no. Not because the agents aren't capable — they are. But because the platforms delivering them were built by developers optimizing for capability, and security was left as the deploying organization's problem. That worked when agents were demos. It does not work when they are running in a hospital, a bank, or a claims processor.
40% of enterprise applications will embed AI agents by end of 2026 — up from less than 5% in 2025.
Gartner, August 2025
81% of technical teams are already in active testing or production. Only 14.4% have full security approval for those deployments.
State of AI Agent Security 2026 Report, Gravitee — 900+ executives and practitioners surveyed
73% of CISOs are very or critically concerned about AI agent risks. Only 30% have mature safeguards in place.
State of AI Agent Security 2026, NeuralTrust — 160+ CISOs and security leaders surveyed
These are not projections. They are the current operating environment. The agents are already there. The security foundations, in most cases, are not.
The uncomfortable truth is that the frameworks needed to govern AI agents securely are not new. They exist. Enterprises already operate under them. The problem is that the current generation of agent platforms was never evaluated against them. What follows is that evaluation.
STRIDE: Six Threats Your Agent Probably Fails
STRIDE is Microsoft's threat modelling framework. Its six categories — Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege — were developed for traditional software but map directly onto AI agent deployments.
Spoofing. Can this agent be impersonated by a malicious actor, or can it impersonate other services and systems to extract data or permissions it shouldn't have? Most agent frameworks have no formal agent identity at all — the agent operates under whatever service account, API key, or ambient credentials the deployment environment provides. Before you can verify an agent's identity cryptographically, the agent needs an identity worth verifying. Most do not.
Tampering. Can the agent's outputs, tool call results, or intermediate reasoning be modified in transit? In multi-agent architectures — where one agent passes results to another — a single tampered output can corrupt an entire downstream workflow before any human sees it.
Repudiation. Can the agent deny that an action occurred? More practically: can you prove that it happened, what triggered it, and on whose authority? This is the central question in every post-incident regulatory review.
Information Disclosure. Can the agent be induced to surface data it should never expose — system prompts, user data from other sessions, API credentials embedded in its context, or sensitive documents it was given access to for a different purpose?
Denial of Service. Can the agent be looped, flooded, or starved of resources through crafted inputs? Autonomous agents expand this attack surface in four distinct ways: agent flooding, downstream amplification, financial exhaustion — a threat STRIDE never anticipated because it predates pay-per-use cloud economics — and human cognitive overload, where agents generate volumes of plausible but low-quality output that overwhelm the humans responsible for reviewing it.
Elevation of Privilege. Can the agent exceed its defined scope — accessing tools, data, or systems beyond what any single task requires — through prompt injection or chained API calls? This is the most operationally dangerous failure mode, because agents are often provisioned with far more access than they need for any given workflow.
Most agent frameworks have never been evaluated against STRIDE. Not incompletely evaluated — never evaluated at all.
NIST CSF: You Have Two of Five
The NIST Cybersecurity Framework organizes security posture into five functions: Identify, Protect, Detect, Respond, and Recover. Current AI agent platforms, in the most generous assessment, address Identify and Protect. The three missing functions are where the exposure lives.
Detect requires continuous monitoring of agent behavior at runtime — not logs reviewed after the fact, but active anomaly detection. Almost no agent platform provides this.
The instinctive response from security teams is that this is not the agent vendor's problem. But agents are not traditional application software. Traditional applications execute deterministic code paths. An agent's behavior is non-deterministic by design. Your EDR was built to detect known-bad patterns in deterministic systems. It cannot interpret whether an autonomous agent's decision chain is legitimate or compromised.
The ask is not that agent platforms replace your security stack. It is that they produce the telemetry, enforce the boundaries, and expose the control surfaces that make the agent governable by your security stack. Without that, your SIEM is monitoring the container the agent runs in — not the agent. That is the equivalent of monitoring a building's HVAC system and concluding the employees inside are behaving normally.
Respond requires the ability to contain an active threat autonomously — to isolate a misbehaving agent, revoke its tool access, and halt its execution without requiring a human to first notice the problem, diagnose it, and manually intervene.
Recover requires structured restoration with full compliance documentation — a verifiable record of what the agent did, what was affected, what was restored, and what controls were applied.
Three of five NIST CSF functions are functionally absent from the current generation of agent platforms.
OWASP LLM Top 10: Architectural Problems, Not Policy Problems
Five risks from this framework are critical for any enterprise running AI agents. All five share a defining characteristic: they are architectural problems that cannot be solved by policy alone.
Prompt injection shares a lineage with SQL injection — both exploit the confusion between trusted instructions and untrusted input. But LLMs have no parameterization boundary. The data is the instruction set. Defense requires multiple enforcement layers external to the model: input sanitization, output validation, and execution-layer sandboxing.
Insecure output handling occurs when an agent's outputs are passed to downstream systems without validation. The risk is not in the generation; it is in the unvalidated handoff.
Excessive agency is the most pervasive risk. Agents are frequently provisioned with broad tool access to make them maximally useful, then deployed into production with that full access intact. The blast radius of a compromised or misbehaving agent is the full scope of everything it can reach.
Regulatory boundary violation — agents that process data without enforced boundaries on what classifications are permissible. The agent does not know whether the data it just accessed requires HIPAA handling, GDPR consent, or SOC 2 audit controls. If the platform does not enforce those boundaries at the execution layer, the agent will cross them.
Sensitive information disclosure — through direct extraction, indirect leakage, or cross-session contamination. These are the default failure modes of agents that have not been hardened at the execution layer.
SOC II and HIPAA: The Compliance Trap
SOC 2 and HIPAA share a foundational requirement: audit trails that are complete, attributable, and tamper-evident. Every consequential action must be logged. Every log must be verifiable. Every record must be traceable to a specific actor, instruction, and authorization.
An AI agent that executes consequential actions with no cryptographically verifiable record of what it did, why it decided to do it, and on whose authority, is not deployable in any regulated industry. Not as a matter of best practice. As a matter of legal compliance.
The difference between a timestamped application log and a cryptographically signed, append-only audit trail is the difference between a record that satisfies a regulator and one that does not.
"Capable" and "compliant" are not the same word — and in practice, they are frequently in direct tension. The architectural shortcuts that make an agent demo impressive are precisely the properties that make it undeployable in a regulated environment.
Who Owns the Risk?
The answer has always been you. What has changed is not the ownership. It is the stakes. When agents were experiments, the risk was theoretical. Now that they are executing consequential actions in production, the gap between vendor responsibility and deployer liability is measured in regulatory penalties, enforcement actions, and board-level accountability.
None of the frameworks described in this article are new. STRIDE has existed since 2002. NIST CSF has been in force since 2014. The gap is not regulatory — the requirements are clear. The gap is commercial.
Your board will ask what controls were in place. Your regulator will ask for the audit trail. Your legal team will ask what due diligence was conducted before deployment. That is not a vendor problem. It never was. It is yours.
Questions Every CISO Should Be Asking Right Now
A vendor that has done the work will answer these specifically. A vendor that has not will answer in generalities. If you cannot get a direct, technical answer to any of the following, you have your assessment.
Can you provide a cryptographically signed, append-only audit log of every action this agent took — including the specific instruction that triggered it, the tool or API it called, the data it accessed, and the identity that authorized the action?
How does your platform enforce tool scope at the execution layer — not at configuration time, but at runtime, for every individual action the agent takes?
What is your platform's detection capability for anomalous agent behavior at runtime? Where does that detection occur — at the agent execution layer, or does it rely on the enterprise SIEM?
What is your platform's response capability when anomalous or malicious behavior is detected? Can the agent be isolated and halted automatically, without requiring human intervention?
What controls does your platform implement to prevent prompt injection attacks? Are these enforced at the execution layer, or are they guidance provided to the model?
Has your platform been formally evaluated against STRIDE threat categories as they apply to AI agent deployments?
How does your platform handle multi-agent architectures where one agent passes outputs to another? What validation occurs at each handoff?
What is your platform's approach to the principle of least privilege for agent tool access? Can tool permissions be scoped to individual tasks rather than sessions?
Security cannot be bolted on after the fact. It must be present at the execution layer from the first deployment. The frameworks to evaluate it exist. The questions to ask are not complex. What has been missing, until now, is the urgency to ask them.
The ticking is not a metaphor. The agents are already in production.
Sources
Gartner. "Gartner Predicts 40% of Enterprise Apps Will Feature Task-Specific AI Agents by 2026." August 2025.
Gravitee. "State of AI Agent Security 2026 Report." February 2026. (900+ executives and practitioners.)
NeuralTrust. "The State of AI Agent Security 2026." November 2025. (160+ CISOs and security leaders.)
Cloud Security Alliance & Google Cloud. "The State of AI Security and Governance." December 2025.
Enterprise Management Associates. "Agentic AI Identities: The Unsecured Frontier." December 2025.
Deloitte. "State of AI in the Enterprise 2026." 3,235 senior leaders globally.
NIST Cybersecurity Framework (CSF) 2.0. National Institute of Standards and Technology.
OWASP Top 10 for Large Language Model Applications. OWASP Foundation.
Microsoft STRIDE Threat Modelling Framework. Microsoft Security Development Lifecycle.