AI Makes the Problem Worse#
AI systems don’t just inherit the memory safety risks of their underlying software — they amplify them. The AI stack introduces new attack surfaces, higher-stakes failure modes, and unique operational constraints that make traditional defenses insufficient.
Why this matters for AGIACC: this is where classic systems security becomes an AI company problem. In embodied and operational settings, memory corruption is no longer just a bug class; it is a business, safety, and regulatory risk.
The AI Software Stack Is Memory-Unsafe#
Modern AI infrastructure is built overwhelmingly in memory-unsafe languages:
| Layer | Primary Language(s) | Memory-Unsafe? |
|---|---|---|
| GPU drivers (CUDA, ROCm) | C/C++ | ✓ |
| AI frameworks (PyTorch, TensorFlow) | C++ core, Python API | ✓ (native code) |
| NCCL / communication libraries | C/C++ | ✓ |
| Custom CUDA kernels | C/C++ (CUDA) | ✓ |
| Model serving (TensorRT, ONNX Runtime) | C++ | ✓ |
| Operating system kernel | C | ✓ |
Even when the user-facing API is Python, the performance-critical path — tensor operations, memory management, GPU kernel launches, collective communication — executes native C/C++ code.
Specific Risk Areas#
GPU Driver Vulnerabilities#
GPU drivers are among the most complex and privileged software components in an AI system. Vulnerabilities in GPU drivers have direct consequences:
- CVE-2025-0072 (Arm Mali): A use-after-free in the Mali GPU driver was demonstrated in 2025 to bypass MTE on Pixel devices, achieving privilege escalation from an unprivileged app.
- NVIDIA CUDA driver vulnerabilities can expose GPU memory across process boundaries, leaking model weights and training data.
Model Serving and Inference#
Model serving infrastructure processes untrusted inputs (user prompts, sensor data) at scale:
- Buffer overflows in tensor deserialization can be triggered by malformed model files or input tensors.
- Custom operators — many production models include hand-written C++ kernels that bypass framework safety checks.
- Pickle/Protobuf deserialization vulnerabilities in model loading paths can lead to arbitrary code execution.
Multimodal and Agent Systems#
AI systems that process images, video, audio, and execute tools (like OpenClaw) combine multiple native parsers and execution engines:
- Image decoders (libjpeg, libpng) have decades of CVE history.
- Code execution sandboxes are often bypassable through memory safety exploits in the sandbox implementation itself.
- Agent frameworks that control hardware (robots, vehicles) turn software vulnerabilities into physical safety hazards.
Why AI Constraints Defeat Software Defenses#
AI workloads have unique constraints that make software-based memory safety defenses impractical:
Real-Time Requirements#
Autonomous vehicles, industrial robots, and surgical systems require deterministic latency. AddressSanitizer’s 2× overhead or garbage collection pauses are unacceptable.
GPU Memory Pressure#
AI models consume all available GPU memory. Sanitizer shadow memory, bounds tables, or GC metadata compete with model weights and activations for scarce GPU RAM.
Scale and Attack Surface#
A distributed training job spans hundreds of GPUs, each running complex native code stacks. The attack surface grows linearly with scale, while software defenses add overhead at every node.
Long-Running Processes#
Training runs last days to weeks. Temporal safety vulnerabilities (use-after-free, memory leaks) that are rare in short-lived processes become statistically certain in long-running workloads.
Hardware-Enforced Memory Safety for AI#
CHERI-based hardware addresses these challenges at the architectural level:
| AI Constraint | Software Defense Problem | CHERI Solution |
|---|---|---|
| Real-time latency | ASan adds ~2× overhead | CHERI bounds checks are in-line, ~2–5% overhead |
| GPU memory pressure | Shadow memory consumes scarce GPU RAM | CHERI metadata is in-band (128-bit pointers) — no separate shadow |
| Scale | Each node adds software defense overhead | Hardware enforcement — zero software overhead per check |
| Long-running processes | UAF probability increases over time | Temporal safety via capability revocation — deterministic |
| Untrusted inputs | Sandboxes are bypassable | CHERI compartments — hardware-enforced isolation |
How AGIACC frames the stack (directional)#
We are not claiming every bullet below is a shipping product today — it is how we prioritise research and architecture as a young company working with CHERI-class platforms:
- GPU-adjacent control — Reducing trust in monolithic drivers and DMA paths by bounding privilege around data movement (the details depend on SoC and IOMMU reality; capabilities help on the CPU side of the story).
- Training clusters — Interest in bounding worker memory and RPC surfaces so one bad node is less able to poison an entire run (composes with cluster identity and attestation work elsewhere).
- Serving / multi-tenant edge — Compartments as a pattern for isolating inference pipelines where silicon exists.
- Embodied AI — Our near-term centre of gravity: robots, vehicles, and controllers where a fault becomes physical.
Critical, long-running AI stacks eventually need hardware-enforced trust boundaries — capabilities are the class of mechanism we bet on first.
← Back to: Memory Safety Overview