Open Problems

The atlas argues that PKI gets us most of the way there. “Most” is not “all”. Saying so explicitly is the difference between a practitioner’s atlas and a vendor pitch.

8.1 What this chapter does

The preceding seven chapters argue that a substantial portion of EU AI Act high-risk obligations can be discharged with PKI-native infrastructure. We have been honest about the boundaries throughout: Article 10’s data-quality dimension is governance, Article 14’s oversight is HCI, Article 15’s runtime concerns are detection and response. This closing chapter assembles those boundary observations into an explicit open-problem list, both as an act of intellectual honesty and as a research-direction prompt for the next round of standards work.

The six problems below are the questions the atlas could not answer with PKI alone. Each is an invitation to a working group, a research project, or — for some — a follow-up paper.

mindmap
  root((Open problems &lt;br&gt; outside the PKI lens))
    Article 10 data-quality
      Population fitness
      Bias detection
      Train/test leakage
    Article 14 human oversight
      Meaningful approval semantics
      HCI for AI gating
      Anti-rubberstamp design
    Cross-jurisdictional interop
      AI evidence trusted lists
      Inter-supervisor recognition
      Regulator tooling parity
    PQC longevity
      30-year retention horizons
      Post-quantum re-signing
      Algorithm graveyard policy
    Agent-of-agent delegation
      Responsible authority
      Chain auditability
      Sub-agent identity
    Privacy-preserving evidence
      Selective disclosure
      Verifier-binding
      Deployer-anonymity

8.2 Article 10 data-quality dimension

The narrow PKI reading of Article 10 — adopted throughout this atlas (and in the companion preprint) — covers provenance and integrity of the dataset artefact, not its content. PKI cannot answer:

Is the training data representative of the deployment population?
Is it leakage-free (no test contamination, no cross-customer data leakage)?
Does it contain bias in any technically defensible sense?
Does the data-management process satisfy the spirit of Article 10 beyond the letter?

A standardisation question follows: can the trust-services regime be extended with signed, structured fitness claims — perhaps under an emerging ETSI EN 319 4xx series — that bind a deployer’s quality assertion about a dataset to the same audit trail? The answer is non-trivial because quality is harder to formalise than integrity, and most candidate frameworks (statistical fairness metrics, robustness scores, leakage tests) are domain-specific.

A reasonable next step is a Working Group looking at what could be borrowed from medical-device data-governance regimes (GMP, ISO 13485) which have decades of experience binding documented quality processes to product release.

8.3 Article 14 human-oversight binding

Approval events are trivially signable. Meaningful oversight is not. EATF gives us a verifiable record that a human approved at time T under policy P; it does not make the approval informed, deliberate, or non-rubberstamped. The HCI literature has been raising this concern for at least a decade — known as “automation bias” or “rubber-stamp consent” — and there is no clean cryptographic answer to it.

Open questions:

What does a meaningful approval flow look like at the UI level? Estonian DigiDoc-style “you must read N seconds before signing” enforced UX? Substantive question prompts? Adversarial review attached to high-stakes approvals?
Can we measure the meaningfulness of an approval, ex post, in a way the audit trail can record?
How do we distinguish advisory AI from deciding AI in the audit, given that the same agent can move between modes?

This is HCI / policy work, not PKI work — but the audit trail needs to carry the resulting evidence, and that is where the two layers meet.

8.4 Cross-jurisdictional trust-list interoperability

The EU’s Trusted Lists ecosystem (LOTL → national TL → TSP → certificate, see Chapter 2) provides a robust trust-anchor distribution mechanism for trust services. There is no analogous global registry for AI evidence-signing CAs as a specialised category.

Open questions:

Should AI evidence-signing CAs form their own qualified-tier category, with explicit evidence-format requirements (CBOR-shape EATF-compatible packages? CAdES-shape .aep?) and explicit cross-jurisdictional recognition?
How do non-EU jurisdictions (UK ICO, US NIST AI RMF ecosystem, Japan METI guidelines) participate, given that AI agents cross borders by default?
What’s the right governance body? ENISA is plausible for the EU; IEEE / ISO for global; the PKI Consortium for an industry-led approach.

The answer probably involves a meta-trust-list of AI evidence authorities, layered above the existing eIDAS Trusted Lists. The engineering work is small; the governance work is hard.

8.5 PQC + audit-ledger longevity (the 30-year horizon)

The AI Act high-risk regime implies retention windows that are plausibly decades-long. A signed evidence package made today and audited in 2055 traverses three plausible cryptographic eras:

Today (2026): classical (RSA-PSS, ECDSA) is dominant; hybrid PQC is rolling out.
Mid-term (2035): classical may be deprecated for new signatures but legacy signatures must still verify. Pure PQC common.
Long-term (2050): the PQC algorithms standardised in 2024 may themselves be deprecated, replaced, or superseded.

The technical question is: what does a 30-year retention policy look like under simultaneous classical and PQC algorithm deprecation? PAdES-LTA-style archival timestamping — periodically adding new timestamps with current algorithms over the entire signed structure — answers part of the problem, but not all of it: the original signing keys may have been deleted, the original CAs may be gone, and re-signing the original content under a new algorithm is not generally possible without the original signer present.

Open questions:

Should there be a trusted preservation service category under eIDAS that takes responsibility for maintaining the cryptographic validity of long-retained artefacts across algorithm transitions?
What does witness signing — a trusted third party re-attesting to the historical existence of a signed artefact — look like operationally?
For AI evidence specifically, can we accept some forms of partial archival where only the canonical content hash is preserved cryptographically and the original signature is carried as historical metadata?

This is a research direction we expect to occupy regulators and trust-service providers through the 2030s.

8.6 Agent-of-agent delegation and responsible authority

Most current architecture assumes a one-step delegation: a human delegates to an agent, the agent acts. Real-world AI deployments increasingly involve multi-step delegation: a user delegates to an orchestrating agent, which spawns sub-agents, which call tools through other agents, in chains of varying depth.

The audit-trail question is: when a sub-sub-agent does something auditable, who is the responsible authority? PKI gives us a naive answer (the certificate that signed the sub-sub-agent’s output), but the legal and operational answer is more nuanced:

The user who started the delegation has the legal interest.
The deploying organisation has the regulatory obligation under Article 26.
The sub-sub-agent has the cryptographic identity but no agency in any meaningful sense.

EATF currently models this as a chain: the dep field identifies the deployer, the agt field identifies the immediate agent, and the audit ledger preserves the call chain through event references. But the chain is brittle: a deeply nested delegation produces deeply nested evidence, and unwinding it for audit is operationally painful.

Open questions:

Is there a standardised way to express delegation chains in the evidence format itself, similar to OAuth’s act or behalf claims? What about cycles?
How do we audit agent-initiated delegations (where one agent decides to consult another) versus user-initiated ones?
What does “kill the agent” mean when the agent is multi-tier?

8.7 Privacy-preserving evidence packages

Today’s .aep packages identify the deployer, often the agent, and sometimes — through input/output hashes — the user. This is operationally fine for most scenarios but philosophically uncomfortable: a regulator who fetches a verifier URL learns more about the deployer than the regulator perhaps needs to know.

Open questions:

Can we adapt selective disclosure schemes (BBS+ signatures, CL-style attribute hiding) to evidence packages, so that a verifier can confirm “this output was signed by some deployer of class C, under policy version V” without learning which deployer?
What does verifier-binding look like — where the same evidence package reveals more to a regulator than to a bystander?
For sectors with strong privacy requirements (medical, legal, finance), do we need a privacy-preserving variant of EATF before partner-integration onboarding?

Selective disclosure is technically tractable in 2026 but operationally premature; we expect this to be the most active research area in trust services through 2027–2028.

8.8 What this means for the practitioner

If you take one thing from this chapter, take this: PKI is necessary, not sufficient, for AI Act high-risk compliance and for AI-agent governance more broadly. A team that understands the boundaries — and engineers honestly to them — is in much better shape than a team that promises substrate completeness and discovers the boundaries during their first regulatory audit.

The substrate is one of the easier pieces to get right. The governance, oversight, and data-quality work above the substrate is where the next decade of trustworthy-AI engineering happens.

8.9 Outgoing links

→ Preprint §8 · the academic version of this chapter, with explicit forward references to follow-up papers.
→ Aletheia / EATF roadmap · which of these problems we are actively investigating, which we defer.