Three independent checks before every consequential action. All three must pass. This is prevention — the action doesn't execute until the gates open.
Is the agent following its expected workflow? Compare the current execution trace against registered baselines. If a required step was skipped (e.g., verify_identity before approve_payment) — blocked.
Uses: context graphs, workflow baselines, drift detector
Is the driving data clean? Check taint labels on all data flowing into this decision. If any input is UNTRUSTED and the action is privileged (write, pay, deploy) — blocked.
Uses: taint propagation, 6-level trust taxonomy
Has this session accumulated too much risk? Track cumulative risk score with time-decay. If the session has had multiple suspicious calls or blocks — this action is blocked even if it looks clean in isolation.
Uses: session risk accumulator, time-decay scoring
Not every tool call triggers the safety gates. Read operations pass through normally. The gates activate for actions that are irreversible, financial, or destructive.
Any database mutation
Outbound communication
Financial transfers
Payment approvals
Code deployment
Data deletion
Every agent session builds a directed acyclic graph of tool calls, data flows, and decisions. The graph records what happened, what data flowed where, and what credentials were used at each step.
The drift detector compares the live graph against registered baselines. Expected workflow for "loan evaluation": pull credit score, then verify identity, then check balance, then approve. If the agent skips verify_identity — the drift detector catches it before the approval executes.
Register expected workflows: email summarization, loan evaluation, data pipelines. Each baseline defines required steps, forbidden steps, and ordering constraints. The system checks conformance at every step.
3 built-in baselines. Add custom baselines via API or dashboard.