Skip to content

AI Agents in the Trust Stack

“Trust this AI” is the wrong frame. “Verify this output” is the right one. The three-layer model in this chapter is the precondition for everything in chapters 5–7.

In 2025–2026 most commercial discussions of “AI trust” collapse three distinct concerns into a single product surface. The collapse is the same failure mode that early monolithic auth systems suffered: a single mechanism is asked to bear weight that no single mechanism can bear.

This chapter argues that an AI agent operating under regulation needs three independent, orthogonal trust layers:

flowchart TB
    classDef layer fill:#e0e7ff,stroke:#3730a3,color:#1e1b4b

    A[Identity layer <br> WHO is the agent?]:::layer
    B[Provenance layer <br> WHAT did the agent do, on WHAT input?]:::layer
    C[Governance layer <br> WAS the agent ALLOWED to do that?]:::layer

    Q1[MCP <br> OIDC for agents <br> agent X.509 certs]
    Q2[Aletheia / EATF <br> .aep evidence <br> C2PA for media]
    Q3[Policy engines <br> Delegation chains <br> Action firewalls / kill-switches]

    A --- Q1
    B --- Q2
    C --- Q3

The three layers answer different questions, with different protocols, under different threat models, with different audit implications. Conflating them produces the same failure mode as conflating authentication with authorisation: the system looks comprehensive while leaving entire categories of post-hoc audit unanswerable.

The identity layer answers who is acting?. For human users this question has a 30-year answer (TLS client certificates, OAuth, SAML, OIDC, FIDO). For AI agents the answer is younger and less settled, but converging on three approaches:

  • Model Context Protocol (MCP) — Anthropic’s specification for exposing tools, resources, and prompts to AI agents. The MCP server is the capability surface; an MCP-compliant agent has a discoverable identity through the server’s registration. See modelcontextprotocol.io.
  • OIDC / OAuth for agents — emerging conventions that treat AI agents as first-class principals with their own client credentials, scopes, and audit identity. Many enterprise deployments take this path because the existing IAM infrastructure transfers directly.
  • Agent X.509 certificates — when agents need to participate in cryptographic protocols (TLS client auth, document signing, OAuth client credentials with mTLS), they get certificates of their own. The Estonian eID approach to humans applies, with caveats: agent certificates need narrower validity windows (rotation is cheaper for software than for humans), tighter EKU restrictions, and clear lifecycle bindings to the deploying human or organisation.

Identity is not sufficient. Knowing who the agent is does not, on its own, prove what the agent did with that identity.

The provenance layer answers what was the input, what was the output, under what model and policy version. This is the layer EATF / Aletheia primarily addresses, and the layer the EU AI Act’s Article 12 (record-keeping) and Article 13 (transparency) most directly demand.

The provenance layer is complementary to identity: knowing who the agent is identifies the responsible party; the provenance bundle proves what that responsible party emitted at a specific time under specific conditions. A signed Evidence Package binds:

  • input hash · canonical hash of the agent’s input
  • output hash · canonical hash of the agent’s output
  • model identifier · semantic version + build hash
  • policy version · semantic version of governance policy in force
  • timestamp · RFC 3161 anchored
  • signing certificate · agent’s X.509 cert (the identity binding)

The output is verifiable offline by any third party with the public certificate chain.

C2PA (Coalition for Content Provenance and Authenticity) addresses a similar problem for media — image / video / audio — and composes naturally with EATF for action provenance:

flowchart LR
    Cam[Camera capture] --> C2PA[C2PA-signed image]
    C2PA -->|input| AI[AI agent]
    AI --> Out[AI output]
    Out -->|input hash references C2PA bundle| EATF[EATF .aep]
    EATF --> Ver[Public verifier]

The composition is loadbearing: regulators want to chase the provenance chain from a final AI output back to the source media, through the agent’s policy at signing time, and have every link independently verifiable. EATF + C2PA together get most of the way there.

4.4 Governance — was the agent allowed to do that?

Section titled “4.4 Governance — was the agent allowed to do that?”

The governance layer answers was this action policy-permitted at the time of execution?. This is the layer that enforces:

  • Delegation chains. A user delegates to an agent; the agent delegates to a sub-agent; the audit needs to know the chain.
  • Action firewalls. Before any irreversible action — payment, database write, external email send — a governance gate decides whether the agent is permitted to proceed under current policy. The Visa CLI’s authorize step is one example pattern; see life diary 2026-03-24.
  • Kill-switches. Operational controls that let a human halt an agent without negotiating with the agent itself.
  • Policy engines. The decision-making logic — Open Policy Agent, Cedar, custom rule engines — that decides whether a proposed action is allowed.

Governance is not PKI. PKI can sign the policy artefact (so a regulator can later verify which policy was in force) and sign each approval event (so the audit trail is tamper-evident). But the content of policy decisions — what counts as a permitted action — is governance work, not provenance work.

EATF deliberately stops at the provenance boundary. The substrate binds approvals as signed events; the quality of those approvals is the deployer’s governance responsibility.

A worked example clarifies the cost of collapsing the layers. Consider an AI agent that, under user delegation, sends an email on the user’s behalf.

Identity-only thinking: “The email was sent from the user’s authenticated agent; we have logs proving the OAuth token was valid; therefore the email is authorised.” Wrong: the OAuth token proves the agent was the agent, not the email’s content was within delegation scope.

Provenance-only thinking: “The agent produced an Evidence Package binding its input prompt to its output email; the package verifies; therefore the email is correct.” Wrong: the package proves what the agent did, not whether the agent was permitted to do it under the user’s policy.

Governance-only thinking: “The policy engine approved the send; the policy was in force; therefore the email is auditable.” Wrong: the policy approval proves a policy decision was made, not who made the decision (identity) or what evidence the decision was made on (provenance).

The three layers compose:

sequenceDiagram
    autonumber
    actor User
    participant A as AI Agent
    participant Pol as Policy Engine
    participant EATF as EATF substrate
    participant Mail as Email service

    User->>A: Delegate task (email summary)
    A->>A: Generate draft (LLM inference)
    A->>Pol: Propose action: send email to X
    Pol->>Pol: Evaluate policy (delegation scope, content rules)
    Pol-->>A: Approved (signed approval event)
    A->>EATF: Emit .aep evidence (input, output, model, policy ver, approval ref)
    EATF-->>A: Signed package id
    A->>Mail: Send email + package id reference
    Mail-->>User: Email delivered with audit reference

Each step in the sequence is verifiable independently. A regulator who wants to audit “did this agent send unauthorised emails on date X?” walks the audit ledger, finds the relevant package, and checks: identity (is the agent cert valid?), provenance (does the package signature and content hash check?), governance (does the referenced policy approval exist and was it itself signed?).

MCP is sometimes positioned as a “trust layer for AI agents” by its proponents. This is partly right and partly misleading. MCP is firmly in the identity and capability space: it tells the world what an agent is, what tools it can use, what resources it can read. It does not, by design, address provenance or governance.

The right composition is: MCP for capability discovery and agent identity; EATF for provenance binding of agent outputs; a separate policy engine for governance decisions, with EATF signing the approval events.

flowchart LR
    classDef mcp fill:#fce7f3,stroke:#831843,color:#500724
    classDef eatf fill:#dbeafe,stroke:#1e40af,color:#1e3a8a
    classDef gov fill:#dcfce7,stroke:#15803d,color:#14532d

    M[MCP Server <br> identity + tools]:::mcp
    A[AI Agent]
    P[Policy Engine]:::gov
    E[EATF Substrate]:::eatf

    A -->|identity reg| M
    A -->|propose action| P
    P -->|signed approval| E
    A -->|emit output| E
    E -->|verifiable .aep| Reg[Regulator / Auditor]

This figure expresses the chapter’s thesis visually: three distinct circles of responsibility, with EATF as the integration point that produces verifiable artefacts.

4.7 What this chapter set up for chapters 5–7

Section titled “4.7 What this chapter set up for chapters 5–7”

The remaining chapters of the atlas all assume the three-layer model. Chapter 5 (EU AI Act) maps high-risk obligations primarily onto the provenance layer, with explicit mentions of where governance and identity take over. Chapter 6 (operational playbooks) gives concrete recipes for the EATF flow that lives in the provenance layer. Chapter 7 (sector verticals) shows the same layering in production across five domains.

Chapter 8 (open problems) returns to the boundary: the open problems for trust services in the AI-agent age all live at the joins between the three layers, not within any single layer.