Foundations of AI Runtime Security

Table of Contents

The new threat surface
#

AI systems have moved beyond text generation. Modern AI agents control hardware, execute shell commands, access files, browse the web, and integrate with messaging platforms. This expansion of capability is simultaneously an expansion of attack surface.

The OECD’s AI Incident Monitor tracked a 21.8× increase in AI security incidents between 2022 and 2024. This acceleration isn’t surprising — as AI gains more authority over digital and physical systems, the value of compromising those systems grows proportionally.

The threat landscape for AI runtime security spans several distinct categories:

Prompt injection — Malicious instructions embedded in data (emails, documents, web pages) that hijack AI agent behavior. When an agent reads a document containing hidden instructions, it may execute those instructions as if they came from its operator.

Model extraction and theft — Proprietary model weights represent millions of dollars in training investment. Without protection, inference APIs can be probed to reconstruct model capabilities, and unencrypted weights on disk or in memory can be exfiltrated by compromised infrastructure.

Data poisoning — Training data manipulation that embeds backdoors or biases into model behavior. Poisoned models may behave normally during testing but act maliciously on specific trigger inputs.

Supply chain attacks — Malicious plugins, compromised dependencies, and tampered model checkpoints. The OpenClaw ecosystem’s ClawHub marketplace saw approximately 20% of listed extensions flagged as malicious in early 2026.

Side-channel attacks — Timing, power, and electromagnetic analysis that extract secrets from running computations. GPU memory residuals can leak model weights and training data across tenants in shared infrastructure.

The three-layer defense model
#

Effective AI runtime security requires defense in depth across three complementary layers. Each addresses different threat vectors and operates at different points in the trust hierarchy.

Layer 1: Software runtime hardening
#

The most immediately deployable layer. Software hardening wraps existing AI systems with protection mechanisms that don’t require hardware changes:

Tool call blocking — Preventing dangerous operations (file deletion, network access, privilege escalation) before execution based on policy rules
Prompt injection detection — Identifying instruction-like content in data inputs using classifier models and heuristic rules
PII redaction — Removing sensitive data from agent outputs and logs
Human approval gates — Requiring operator authorization for high-risk actions (financial transactions, system modifications, external communications)
Tamper-evident audit logging — Maintaining integrity-protected records of all agent actions for forensic analysis

Software hardening provides immediate value but has fundamental limitations. Protection runs alongside the code it protects — a sufficiently capable attacker who compromises the runtime can disable the protections themselves.

Layer 2: Confidential computing (TEE-based protection)
#

Trusted Execution Environments create hardware-isolated enclaves where computation occurs in encrypted memory that even the host operating system and hypervisor cannot read:

CPU TEEs — Intel TDX (Trust Domain Extensions) and AMD SEV-SNP (Secure Encrypted Virtualization — Secure Nested Paging) create isolated virtual machines with encrypted memory and integrity verification
GPU TEEs — NVIDIA Confidential Computing extends TEE protection to GPU workloads, encrypting data on the PCIe bus and within GPU memory. Supported on H100, H200, and B200 GPUs
Remote attestation — Cryptographic verification that a TEE is running expected code on genuine hardware, without modification. Intel Trust Authority and similar services provide independent attestation verification
Secure key release — Encryption keys are only released to workloads that pass attestation, ensuring sensitive data is only accessible in verified environments

Confidential computing solves the “data in use” problem — protecting models and data while they’re being processed, not just at rest and in transit. Fortanix, EQTY Lab, Phala Network, and others have deployed production confidential AI systems in 2025–2026.

The limitation is performance overhead (though improving rapidly — NVIDIA’s Blackwell architecture achieves near-native performance for confidential workloads) and the requirement for TEE-capable hardware.

Layer 3: Hardware capability architecture (CHERI)
#

The most comprehensive layer, requiring purpose-built silicon but providing deterministic guarantees that software and TEEs cannot match:

Capability pointers — Every memory reference becomes an unforgeable, bounded token of authority. A pointer that is authorized to access bytes 1000–2000 physically cannot be used to read byte 2001, regardless of software bugs
Fine-grained compartmentalization — AI subsystems run in hardware-enforced compartments with explicit boundary policies. A compromised plugin cannot access memory belonging to another component, even within the same process
Deterministic enforcement — Protection is not probabilistic (like Arm MTE’s 4-bit tags) or bypassable through secret leakage. CHERI capabilities are enforced by the processor pipeline on every memory access
Near-zero overhead — Moving enforcement into hardware keeps protection predictable enough for real-time workloads. Codasip’s CHERI-RISC-V core demonstrates negligible performance impact

CHERI is supported by the University of Cambridge, SRI International, Arm (Morello), Microsoft (CHERIoT), the UK’s Digital Security by Design programme, and the CHERI Alliance. Commercial silicon is on the horizon but not yet at scale.

The regulatory pull
#

Global AI safety regulation is creating mandatory demand for demonstrable security measures:

Regulation	Region	Key security requirement
EU AI Act	EU	Demonstrable security measures for high-risk AI systems
AI Computing Platform Security Framework	China	Chip-level trusted computing as security foundation
NIST AI RMF	US	Risk management framework with security controls
Digital Security by Design	UK	Hardware-enforced memory safety programme
Data Security Law / PIPL	China	Data protection with technical measures

Regulatory compliance is shifting from “document your policies” to “demonstrate your controls.” Hardware-enforced and cryptographically verifiable security measures satisfy auditors in ways that software-only policies cannot.

Where this is heading
#

The AI security stack is converging on a layered model where software hardening provides immediate protection, confidential computing protects data and models during computation, and hardware capabilities provide architectural guarantees for the most critical systems.

Organizations deploying AI agents today should:

Start with software hardening — Deploy runtime protection (tool blocking, prompt injection detection, audit logging) immediately
Evaluate confidential computing — For workloads handling sensitive models or data, TEE-based protection is production-ready today
Plan for hardware capabilities — Track CHERI ecosystem progress and evaluate where deterministic memory safety fits in your long-term architecture

The cost of AI security incidents is measured not just in data breaches but in physical safety, regulatory penalties, and erosion of trust in AI systems broadly. Infrastructure-level security is becoming a market requirement, not a feature.

The new threat surface#

The three-layer defense model#

Layer 1: Software runtime hardening#

Layer 2: Confidential computing (TEE-based protection)#

Layer 3: Hardware capability architecture (CHERI)#

The regulatory pull#

Where this is heading#