Sandboxed AI Agents and the File Pipeline

I am pretty late to the party when it comes to OpenClaw, at least when keeping in mind the rapid pace of AI development 2026 has already seen. Reason being mostly that I am quite security minded. Having an agent run as root on a VPS seemed extremely dangerous to me (still does), so I was immediately turned off the idea and mostly ignored it.

That was until last weekend, when a less risk-averse individual than me actually set it up for me on a VPS of mine, root rights and all. The risks of this are pretty obvious, but I will probably get into them another day. What I didn’t see immediately though was what AI agents like this could actually accomplish for businesses or individuals.

What I want to talk about today is how my setup, now 3 days of trial and error later, looks now, and why I think that it actually might be somewhat secure.

Breaking legs

If you are interested in using AI agents but still care in any way about your sensitive data not being published on the open internet, you probably have heard about the Lethal Trifecta. If you haven’t, here is a super-condensed summary:

If an AI agent has

Access to private data
Ability to externally communicate
Exposure to untrusted content

it can, at least according to current consensus, NEVER be secure.

Because of this, to achieve any measurable sense of safety for private data or company secrets, we need to break at least one of the “legs” of the trifecta and make sure that it stays broken.

My current setup consists of two agents: Main and Researcher. Each agent has its own workspace which is sandboxed, meaning that each agent can access only its own workspace, and not the workspace of the other agent.

Main is the one I am talking to directly. It has access to all kinds of sensitive data, but it is air-gapped, meaning that at no point does it have the ability to directly communicate with the outside world. Thus it has a broken second leg.

Researcher has access to the internet and a search engine. It gets a prompt that’s written by Main. Currently I feed it this prompt by hand, but this will be automated. Researcher writes its results to its workspace. It does not have access to any of my sensitive data. Thus it has a broken first leg.

The sandboxing is achieved via Docker and is graciously already part of OpenClaw. Here is a snippet of my openclaw.json. You can give this to an LLM and let it explain what it does.

"agents": {
  "defaults": {
    "model": {
      "primary": "anthropic/claude-sonnet-4-6"
    },
    "models": {
      "anthropic/claude-opus-4-6": {}
    },
    "sandbox": {
      "mode": "all",
      "workspaceAccess": "rw",
      "scope": "agent",
      "docker": {
        "network": "none"
      }
    }
  },
  "list": [
    {
      "id": "main"
    },
    {
      "id": "researcher",
      "workspace": "~/.openclaw/workspace-researcher",
      "sandbox": {
        "mode": "all",
        "scope": "agent",
        "workspaceAccess": "rw",
        "docker": {
          "network": "bridge"
        }
      },
      "tools": {
        "sandbox": {
          "tools": {
            "allow": ["web_search", "web_fetch", "read", "write", "edit", "apply_patch"]
          }
        },
        "deny": ["exec", "process", "browser", "canvas", "nodes", "cron", "gateway", "sessions_spawn", "sessions_send", "subagents"]
      }
    }
  ]
}

The critical piece here is the somewhat underdocumented tools.sandbox.tools.allow, without which an agent with sandbox.mode: all doesn’t have access to gateway tools like web_fetch.

Building bridges

You might wonder what this all achieves if each agent only has access to its own workspace. The way I am currently using to connect the two agents is somewhat barebones and pretty archaic: a systemd trigger. If you don’t use Linux I am sure you can use a similar mechanism. In short it works like this:

Researcher does its thing and creates a file that’s always called research-result.md
A systemd path unit watches Researcher’s workspace for a file with a specific name (e.g. copy-research-result), which is created by Researcher after it has finished writing research-result.md
When that file appears, systemd runs a linked service, which is a shell script copying research-result.md from Researcher’s workspace to Main’s workspace
The script deletes the original research-result.md

This way, Main has access to the research results and can analyze them, without ever having access to the outside world at any time.

Here is the systemd path unit:

[Unit]
Description=Watch for researcher copy requests

[Path]
PathExists=/root/.openclaw/workspace-researcher/copy-research-result
Unit=researcher-copy.service

[Install]
WantedBy=default.target

The linked service:

[Unit]
Description=Copy research result from researcher to main workspace

[Service]
Type=oneshot
Environment=HOME=/root
Environment=PATH=/usr/local/bin:/usr/bin:/bin
ExecStart=/root/.openclaw/researcher-copy.sh

And the shell script that does the actual copying:

#!/bin/bash
set -e

SRC="/root/.openclaw/workspace-researcher/research-result.md"
DST="/root/.openclaw/workspace/reports/incoming/research-result.md"
TRIGGER="/root/.openclaw/workspace-researcher/copy-research-result"

if [ -f "$SRC" ]; then
    cp "$SRC" "$DST"
    rm "$SRC"
fi

rm -f "$TRIGGER"

Injection/Exfiltration hopping

While Main doesn’t have direct access to the outside world, it gets input from outside via the copied file. Since this is written by a “friendly” agent, any prompt injection risk is somewhat mitigated. But: mitigation does not mean it is secure. So this avenue of attack is definitely something that needs more thought.

Additionally, if one would want to allow Main to trigger Researcher — also via copied files — you basically have taped the broken legs together again, in a Frankenstein-esque way. Then, in theory, sensitive data could reach the outside again.

A mitigation strategy for this might be human oversight. If a human had to read the files before and sign off on them before they are copied, the risk would be much less. This is, of course, only my current opinion and subject to change. We will revisit this quite soon, I’d imagine.

BreakLeg: Securely design your AI agent setup

Changes to reading mode