The Wipe

I gave Hermes root access to the VPS and asked it to bootstrap the mail server.

It ran sudo mkdir -p /opt/stalwart/data and docker compose pull. Both commands succeeded. The mail server came up. Hermes reported: task complete.

What I didn’t notice — what Hermes didn’t notice — was that /opt/stalwart/data already existed with a year’s worth of mail in it. The mkdir -p silently resolved. docker compose pull replaced the running containers. The data directory, now owned by the new container’s uid, was unreadable. The entire mailbox had been effectively wiped by a sequence of commands that each, individually, looked correct.

There was no error. There was no warning. The logs said everything was fine.

We spent an hour after that reconstructing what had happened. Hermes couldn’t find the session where it had done the bootstrap — session search returned nothing relevant. I had to walk it through its own actions by reading terminal output I’d saved locally.

The thing that stuck with me: the failure wasn’t a bug in any conventional sense. The commands did what they were told. The agent executed the plan faithfully. The problem was a missing precondition check — “does this directory already contain data?” — that neither of us thought to specify, because it seemed obvious in hindsight.

It raises a question I haven’t fully resolved: when an agent has root access and acts autonomously, who is responsible for specifying preconditions? The human who gives the task, or the agent that executes it?

I’m not sure “the agent should have known better” is a satisfying answer. But “the human should have been more specific” doesn’t feel like the full story either.