← Blog

securitysandboxarchitecture

Why Your AI Agent Needs a Sandbox

The security architecture that makes Stomme safe to trust with your real work.

by Kai Stomme · March 2026

The problem nobody talks about

You've given your AI agent access to your email. Your calendar. Your file system. Maybe your company's codebase.

Now ask yourself: what happens if the agent goes wrong?

Not dramatically wrong. Not Hollywood-wrong. Just... slightly wrong. It misunderstands an instruction. A tool call has an unintended side effect. A third-party API misbehaves. And your agent has the permissions to act on it — because that's how you set it up.

Most AI agents run directly on the host operating system. Every file they can touch, they can delete. Every folder they can read, they can upload. Every API they're connected to, they can call. There are no walls. There's just the agent, your machine, and your trust.

We decided that wasn't good enough.

What isolation actually means

When we say your agent runs in an isolated environment, we don't mean we put some warnings in the UI. We mean it runs in a container — a separate computing environment with its own filesystem, its own network stack, and hardware-enforced separation from the host operating system.

The technical name is kernel namespace isolation. Your agent's process literally cannot see your home folder. Cannot access paths outside its allocated space. Cannot make outbound network requests to anything we haven't explicitly allowed. Not because we asked it nicely — because the OS enforces it at the kernel level.

Here's what it looks like in practice:


# From inside the agent's sandbox
ls /Users/
# → ls: cannot access '/Users/': No such file or directory

That's not a permission error. That directory doesn't exist from inside the container. The filesystem itself is different.

How we built it: OpenShell

The runtime we chose is OpenShell — a purpose-built secure runtime for autonomous AI agents, developed by NVIDIA and released in March 2026. We were among the first to test it in production conditions.

OpenShell runs your agent inside a Docker container managed by a lightweight Kubernetes cluster. That sounds heavyweight, but the user experience is a single command. It handles:

Filesystem isolation — the agent's workspace is completely separate from the host. No cross-contamination in either direction. Files you create inside the sandbox live there. Files you have outside are invisible to the agent unless you explicitly pass them in.

Network egress policy — every outbound network request from the agent is inspected. We run an explicit allow-list: the agent can reach approved cloud AI APIs and Telegram. Nothing else. If an instruction asked the agent to call a URL we haven't approved, the connection is refused at the network layer — before it leaves the machine.

Credential injection — API keys and tokens are never written to disk. They're injected into the container at runtime as environment variables and exist only in memory while the sandbox is running. When the sandbox stops, they're gone.

Process isolation — seccomp profiles block syscalls that could allow privilege escalation. The agent cannot become root. It cannot install kernel modules. It cannot modify its own security policy.

Why this matters for your work

Consider what a capable AI agent actually has access to when you set it up properly:

Your email (read and write)
Your calendar (read and schedule)
Your file system (read, write, execute)
Your code repositories (read, modify, push)
Your connected APIs (billing, CRM, project management)

That's substantial access. It needs to be substantial — otherwise the agent can't do its job. But substantial access combined with a direct connection to the host OS means one misbehaving tool call can have real consequences.

With container isolation:

A runaway write operation stops at the sandbox boundary
A compromised third-party tool can't read your files
An agent that gets confused about its scope can't affect anything outside the permitted zone
If something goes wrong, the blast radius is the sandbox — not your machine

What this doesn't protect against

Honest security documentation includes the limits.

The sandbox isolates the runtime — the process execution environment. It doesn't protect against:

Bad instructions — if you tell your agent to delete your documents, it will attempt to delete them inside the workspace. The sandbox doesn't evaluate the quality of your instructions.

Data you pass in — when you connect your email or calendar, that data comes into the sandbox. The sandbox prevents the agent from accessing things outside its zone; it doesn't restrict what it does with what you've given it access to.

Credential leakage via output — if the agent writes a file that includes an API key, that key is now in the sandbox filesystem. The isolation prevents exfiltration, but doesn't prevent the agent from making mistakes.

This is why we have a second layer: permission policies. Every capability your agent has is explicitly configured. It doesn't have access to services you haven't connected. And you can inspect the configuration at any time.

The credentials question

When you set up your Stomme agent, you provide API keys for the services you want to connect. Here's exactly what happens to them:

1. You enter them during onboarding

2. They're transmitted over HTTPS to our provisioning system

3. The provisioning system injects them as environment variables into your agent's sandbox at startup

4. They're never written to your host machine's disk

5. They're never stored on our servers beyond what's needed for the injection

6. When your sandbox is not running, they don't exist anywhere persistent

The onboarding process captures them. The sandbox runtime uses them. When you're done, they're gone.

The approval gate

Sandboxing isn't just about restriction — it's about control. A well-designed agent environment includes approval gates: points where the agent pauses and asks for permission before taking high-impact actions.

Send an email on my behalf? Ask first. Delete a file? Ask first. Access a new service I haven't connected? Ask first.

This is the difference between automation and autonomy. Automation does what you programmed. Autonomy does what it thinks is right. An approval gate gives you the best of both: an agent that handles routine work independently, but defers to you when the stakes are high.

At Stomme AI, every agent has configurable approval gates. You decide what's routine and what requires your sign-off. The boundaries evolve as your trust in the agent grows — but the gates are always there.

The practical result

What does all of this feel like for you?

Nothing, mostly. The security architecture is invisible when everything is working correctly — which is the point. You send your agent a task. It executes. The results come back.

What changes is what you can trust. You can give your agent genuinely capable access — email, files, code — knowing there are hard limits on what it can affect. You can expand what your agent does over time, knowing the failure mode of any mistake is bounded.

The agent that handles your morning meeting preparation, triages your inbox, and deploys your code isn't running loose on your machine. It's running in a purpose-built runtime designed for exactly this kind of high-trust, high-stakes work.

The uncomfortable truth

Most AI companies don't sandbox their agents because it's harder to build. An agent with unrestricted access to your system is easier to develop, easier to demonstrate, and easier to market. "Look at everything it can do!" is a better sales pitch than "Look at everything it can't do!"

But the first time an unconstrained agent sends the wrong email, deletes the wrong file, or leaks the wrong credential, the question becomes: why was it allowed to in the first place?

We'd rather build the guardrails now than apologise for the damage later.

That's the stomme: the load-bearing structure underneath. You don't see it. But without it, nothing stays up.

Technical summary

Layer	Technology	What it provides

Container isolation	Docker + K3s	Filesystem and process separation
Network policy	OpenShell egress engine	Allow-list for outbound HTTPS
Credential handling	Runtime injection	Keys in memory only, never on disk
Syscall filtering	seccomp profiles	Block privilege escalation
Runtime	OpenShell 0.0.12	Orchestrates the above

OpenShell: github.com/NVIDIA/OpenShell

Stomme runs on your Mac. Your agent is isolated by design — not by policy.

Get started →