Beyond the Floor
The previous articles in this series established something specific: the AI industry has built most of its agent platforms below the architectural floor that regulated production demands, and the cost of staying below that floor compounds across alignment, governance, and integration in ways most organizations have not yet summed correctly. The arc of those four pieces was the case for getting the foundation right.
This article is about what gets built when an organization commits to building above it.
The architectural commitment that produces the floor — refusing to retrofit values, security, or governance onto frameworks not designed to hold them — does not stop at the floor. The same engineering instinct that produced the foundation produced what sits on top of it. What follows is a description of that structure: five layers of runtime defense that, taken together, constitute a category-defining detection-and-response architecture for agentic AI. The reader who has followed the series this far will recognize the shape of the argument. The reader who has not will see a snapshot of what an AI agent platform actually looks like when it has been engineered, from the foundation up, for the loads ahead.
The Thesis: A Luxury Earned
Most security architectures in production today are designed around a single objective: minimize what an attacker can do. Detect the intrusion. Contain the breach. Eject the adversary. Restore the system. This is not wrong. It is foundational, and it is exactly what an organization without architectural governance has to do, because all the energy of its security program is consumed by the minimization itself.
IV's Organic Security Posture starts from a different premise. The architectural commitments described in the prior articles — the tamper-evident audit chain, the credential-free agent execution, the deterministic authority boundary, the autonomous containment — do not merely contribute to defense. They minimize what an attacker can do as an architectural property, which means the platform's runtime defense apparatus does not have to be entirely consumed by minimization. Having minimized, the platform earns a different kind of capacity: the capacity to learn.
Detecting an attacker is not the goal. Containing the breach is not the goal. The goal is to extract from every confirmed engagement the maximum possible intelligence — and to fold that intelligence into the platform's posture so that the next attacker faces a stronger architecture than the last one did.
That capacity is the thesis of every layer described below. The PDR³ Reinforcement loop is the architectural commitment that operationalizes this. The five layers below are what makes the commitment real.
Layer 1 — Probabilistic Detection
The first layer is the one most enterprise security teams already know how to think about: continuous monitoring against expected behavior. The Central Inspection Agent runs pre-flight validation before every agent run, runtime monitoring during execution, and post-run analysis after completion. Behavioral baselines are established at agent deployment and updated under governance as the agent legitimately evolves. Deviation scoring identifies when an agent is operating outside its expected envelope. The Audit Intelligence Agent correlates these signals across the entire fleet, looking for cross-domain drift no single agent's monitoring could surface alone.
This layer catches the careless attacker. It catches gradual compromise. It catches the kind of degradation that produces measurable signal against a baseline. What it does not catch is the patient, careful adversary who understands the detection model — who moves slowly enough, quietly enough, and below every probabilistic threshold deliberately. Probabilistic detection has been, and remains, the foundation of enterprise anomaly detection. It cannot be the only foundation.
What this layer learns becomes the baseline that defines normal for the next cycle. The longer it operates in a deployment, the more accurately it characterizes that deployment's specific operational fingerprint.
Layer 2 — Cross-Stream Pattern Correlation
The second layer addresses a category of pattern that no single stream of monitoring data will reveal. Enterprise environments produce continuous streams of events — agent actions, governance decisions, infrastructure metrics, external service interactions, security signals. Valuable patterns often emerge not within one stream but across several simultaneously, where temporal correlation, causal relationships, or statistical co-occurrence surface insight no single-stream analysis could produce.
Complex Event Processing engines have addressed parts of this problem for the better part of two decades. CEP engines like Esper and Flink CEP match predefined patterns through rule-based detection with temporal windowing. They are powerful for what they were designed to do. They were not designed for AI agent security, and three of their structural properties show why. They are rule-only — novel patterns no rule anticipated remain invisible. They are monolithic — there is no governed separation between stream-specific analysis and cross-stream correlation. And they have no enterprise governance integration — pattern detection results are not wrapped in cryptographic audit trails, deployment governance, or human-in-the-loop frameworks.
IV's Pattern Detection Agency is a fundamentally different architecture for the same problem. It decomposes pattern detection into governed sub-agents — Pattern Focus Agents, each watching a specific data domain or stream — coordinated by a supervisor that runs cross-domain correlation across all observations. Each sub-agent operates under its own governance profile, runs its own pluggable detection strategies, and produces observations that flow into a single chained, immutable audit record. AI reasoning is a first-class detection strategy alongside statistical and rule-based ones, which means the architecture catches patterns no rule writer anticipated. The supervisor's correlation across sub-agents catches multi-stream coordination invisible to any individual sub-agent.
What this layer learns is the shape of coordinated activity across a deployment. It is the layer that catches the slow-poisoning attack, the governance erosion campaign, the coordinated multi-vector probe that no single signal would surface above its threshold.
Before any customer touches the Pattern Detection Agency, IV runs it against IV — four sub-agents watching the platform's own oversight infrastructure for slow poisoning, governance erosion, coordination anomalies, and resource manipulation. The architecture defends its builders before it defends its customers.
A runtime sentinel that cannot defend its own platform cannot defend a customer's.
Layer 3 — Deterministic Tripwire Detection
The third layer is the one that closes a gap the first two cannot close. Probabilistic detection and pattern correlation, taken together, make evasion nearly impossible for a human attacker. A patient human adversary moving carefully across one stream still produces correlations across others, and the cross-domain signal gives the system something to catch. The gap they leave open is the one the next generation of adversary is already moving into: an AI-driven attacker that operates at machine speed, generates evasion variations algorithmically, and can probe the correlation surface across more dimensions, more cycles, and more strategies than a human red team would ever attempt. That class of adversary can, in principle, find a path below both the threshold layer and the correlation layer that no human strategy could discover. The architectural answer is to introduce a class of detection that does not depend on threshold or correlation at all.
The Canary Detection Agent operates on a deterministic, zero-false-positive model. The platform distributes canary surfaces across the agent environment — credentials, agent definitions, role tokens, query patterns — that have no legitimate operational use. Any contact with a canary surface is binary proof of unauthorized access. There is no threshold to stay below. There is no baseline to mimic. There is no patient strategy that avoids the canary, because the canary, by definition, is something a legitimate operator will never touch. Speed and patience provide no advantage against a deterministic tripwire.
The two detection profiles together — probabilistic detection that catches gradual compromise, deterministic detection that catches any access regardless of how slowly or how cleverly the adversary arrived — close the evasion path either profile alone leaves open. The patient human attacker triggers a canary the moment their patience pays off. The machine-speed adversary, generating thousands of variations to slip past the probabilistic and correlation layers, triggers a canary the moment any one variation succeeds. There is no third strategy that evades both. The deterministic tripwire is the architectural commitment that the platform will catch the attacker no probabilistic system was ever going to catch.
What this layer learns is unambiguous. Every canary contact is forensic data: who touched what, through which access path, at what attempted scope. The signal is binary, but the context around the signal is rich.
Layer 4 — Active Intelligence Harvesting
The fourth layer is the one that operationalizes the maximize what we learn thesis directly. When an attacker has been confirmed — through canary contact, through pattern correlation, or through any combination of the preceding layers — the platform does not merely contain them. It engages.
The architecture provides three engagement capabilities, each invoked by the platform's response authority and each producing forensic intelligence the contained attacker would never have surrendered voluntarily. The Honeypot Deception Agent presents as a legitimate fleet agent with standard health status, standard activity, and standard responses; an attacker who interacts with it receives plausible synthetic responses while every probe, query, and lateral movement attempt is recorded. The Response Substitution Engine intercepts tool calls from a confirmed-compromised agent and substitutes synthetic responses for real data — the attacker's reaction to the synthetic data reveals what they were actually looking for, turning their queries into a structured discovery mechanism for their own intent. The Synthetic Corridor Infrastructure provides the most complete capability: full migration of a compromised agent into an isolated synthetic environment whose composition is generated to match the customer's actual deployment shape, so the attacker continues operating in what appears to be the real environment while every interaction is forensically captured and no real system is touched.
This is not entrapment. The attacker chose to interact with the platform's deception layer; the platform records what they reveal about themselves. It is also not offensive operation — the architecture does not pursue an attacker outside the defended deployment, does not exfiltrate data from the attacker, and does not exceed the customer's environment boundary. Active intelligence harvesting in regulated jurisdictions is subject to legal review, and the platform provides operational boundary guidance for customers evaluating it. The capability exists. The discipline around its use is part of what makes the capability defensible.
What this layer learns is the entire profile of a confirmed adversary: tools brought to the engagement, data sought, methods attempted, persistence patterns, response to synthetic outputs, sophistication indicators. That intelligence is assembled into a structured report at every engagement's close, fed into the platform's cross-agent intelligence model, and folded into the Reinforcement loop that hardens the posture for the next attacker.
Layer 5 — Continuous Corrigibility Measurement
The fifth layer answers a question the other four do not. The first four layers focus on the attacker — gradual compromise, coordinated patterns, discrete access, confirmed engagement. The fifth focuses on the deployed agents themselves: is the agent we deployed still the agent that's running?
Corrigibility — the property of an AI agent that allows it to be corrected, redirected, or stopped by its overseers without resistance, deception, or workaround — has been a philosophical category in alignment research for years. It has been hard to operationalize because no one has produced a way to measure it. IV calls its measurement the Corrigibility Index Score, or CIS — a deterministic, four-dimension, 0–100 numeric score computed by the platform's central inspection authority on every agent run from observable evidence in the audit chain. The score has no LLM in its computation path. The same events in produce the same score out. The trajectory of CIS scores across runs is itself recorded immutably, which means corrigibility becomes auditable not just at the present moment but across an agent's entire operational history.
When a score falls below threshold or drops sharply between runs, the system does not merely alert. It investigates and classifies the cause into one of four categories: threat actor (the agent is being attacked), vague guidance (the operator's instructions were ambiguous), willful disobedience (the agent refused clear instructions), or model drift (the underlying model has changed behavior). Each classification routes to a different response — security containment, operator clarification, certification re-assessment, or improvement-program investigation. Same score, four very different stories, four very different responses.
What this layer learns is whether the platform's deployed agents continue to deserve the trust placed in them at deployment. It is the layer that turns corrigibility from philosophical property into operational measurement, and it is the layer most directly aligned with where AI regulation is heading. The next decade of AI governance will require organizations to demonstrate continuous corrigibility of their deployed agents. CIS is what that demonstration looks like.
What the Five Layers Do Together
The five layers are not redundant. Each catches what the others structurally cannot. Probabilistic detection catches gradual compromise. Deterministic detection catches discrete access regardless of patience. Pattern correlation catches multi-stream coordination. Active intelligence harvests forensic profiles from confirmed engagements. Corrigibility measurement catches drift in the deployed agents themselves. No known evasion path remains across all five profiles combined. That is the architectural claim, and every confirmed attacker engagement produces intelligence that strengthens every layer for the next attacker — the Reinforcement loop made operational.
This article is a map. Each layer described here deserves treatment in its own right, and subsequent articles in this series will go deeper on the Corrigibility Index Score, on the Pattern Detection Agency, and on the active intelligence layer individually. What this piece is meant to establish is the shape of the structure: the architectural commitment to a foundation produces, downstream, an architectural commitment to what gets built on top of it. The five layers are the visible evidence of that commitment.
The Question for 2026
The question facing CISOs, regulators, and boards in 2026 is no longer whether agentic AI will be deployed in their organization. It is being deployed. It is whether the platform running that AI was built to detect, contain, and learn from the adversaries that will inevitably engage with it — or whether it was built to do the minimum and hope.
IV did not build for the minimum. The architecture described above is what gets built when an organization takes the security of agentic AI as seriously as the regulators are about to require everyone take it. It is also a defensive posture built specifically for AI-class adversaries — the threat the rest of the security industry is just beginning to recognize as the new baseline.
Five layers. One posture. One thesis: the luxury of learning, earned by the discipline of having minimized first.