Your Stomme agent runs on your Mac. That's great for privacy. It's less great for the question every new customer asks:
"What happens when something goes wrong?"
Fair question. Here's the honest answer.
The heartbeat
Every few hours, your agent sends a heartbeat — a small signal to our monitoring system that says "I'm alive and working."
The heartbeat contains:
- Agent status (running, idle, errored)
- Health metrics (memory usage, task queue length, last successful operation)
- Version information (which skills are active, which version of the agent software)
The heartbeat does not contain:
- Your conversations
- Your files or documents
- What tasks you've asked your agent to do
- Any personal data
Think of it like a fitness tracker for your agent. We can see the pulse. We can't read the mind.
What we monitor
Our team watches for:
| Signal | What it means | What we do |
|---|---|---|
| Heartbeat stops | Agent may have crashed or Mac is off | Wait 30 minutes, then alert you via email |
| Error rate spikes | Something's breaking repeatedly | Investigate logs (agent-side, no content), push a fix if needed |
| Memory pressure | Agent is using too much RAM | Optimise configuration remotely (with your permission) |
| Skill failure | A specific capability isn't working | Diagnose and patch, usually before you notice |
| API degradation | Cloud reasoning is slow or failing | Switch routing, notify if extended |
Most of these resolve automatically. When they don't, a human gets involved.
Automatic recovery
Your agent is designed to recover from common failures without intervention:
Process crash: The agent process restarts automatically. Your Mac's launch system handles this — if the process dies, it comes back within 60 seconds. No data loss; task state is persisted to disk.
Network interruption: If your internet drops, your agent continues working on local tasks (file management, reading cached emails, preparing briefings from existing data). Cloud reasoning pauses and resumes when connectivity returns. Nothing queues up and explodes.
API outage: If the cloud reasoning service is degraded, your agent falls back to simpler operations and queues complex tasks for when service returns. You'll get a note in your next briefing: "Some tasks were delayed due to a temporary service issue."
Mac restart: Your agent starts with your Mac. After a reboot, it picks up the queue, checks what happened while it was off, and catches up. First briefing after restart includes anything it missed.
When things actually break
Sometimes recovery isn't automatic. Here's what that looks like and how we handle it:
Scenario: Skill conflict after update
A new skill version conflicts with your configuration. Your agent starts throwing errors on email triage.
What happens:
- Heartbeat reports elevated error rate (within minutes)
- Our monitoring flags it
- Support reaches out proactively: "Your email triage skill hit an issue after today's update. We can roll back to the previous version — want us to proceed?"
- With your approval, we push the rollback. Agent resumes normal operation.
Time to resolution: Usually under 2 hours for known issues.
Scenario: Mac runs out of disk space
Your agent needs working space for processing. If your disk fills up, things get slow and then stop.
What happens:
- Agent health metrics show disk pressure
- We alert you: "Your Mac is running low on disk space. Your agent needs approximately 2GB free to operate normally."
- You clear space. Agent recovers automatically.
We can't clean up your disk for you (and wouldn't want to). But we can tell you before it becomes a problem.
Scenario: Extended cloud outage
The reasoning API goes down for more than 30 minutes.
What happens:
- Your agent switches to local-only mode
- You get a notification: "Cloud reasoning is currently unavailable. Your agent can still manage files, read cached emails, and handle local tasks. Complex reasoning will resume when service returns."
- When service returns, queued tasks process automatically
- Your next briefing notes any delays
This has happened exactly once in our testing period. It lasted 47 minutes.
Support: real humans, real response times
| Tier | Support channel | Response time | Hours |
|---|---|---|---|
| Personal | Within 24 hours | Monday–Friday | |
| Professional | Email + priority queue | Within 4 hours | Monday–Friday, 09:00–18:00 CET |
| Business | Email + direct line | Within 1 hour | Monday–Friday, 09:00–18:00 CET |
"Within" means the upper bound, not the average. Most support queries get a first response in under 2 hours for Personal, under 30 minutes for Business.
We don't use chatbots for support. The irony would be too much.
Remote access: only with permission
If we need to look at your agent's configuration, logs, or skill setup to diagnose an issue, we ask first. Every time.
Remote access is:
- Opt-in per incident — you grant permission for a specific issue
- Scoped — we see configuration and error logs, not your files or conversations
- Logged — every remote session is recorded (we can share the log if you want)
- Revocable — you can end the session at any time
We've designed the system so that 90% of issues can be diagnosed from heartbeat data alone, without needing remote access. The other 10% usually involve configuration that's specific to your setup.
What we're honest about
Your Mac needs to be on. If your Mac is asleep or off, your agent sleeps too. We're working on graceful handling for scheduled sleep (pause tasks, resume on wake), but right now: Mac off means agent off.
Updates require your Mac to be online. We push skill updates and patches over the network. If your Mac is offline for an extended period, it'll catch up when it reconnects, but there may be a brief period where things are slightly behind.
We can't prevent all failures. Hardware dies. Networks drop. APIs have bad days. What we can do is detect failures fast, recover automatically where possible, and respond quickly where not.
We're a small team. Support outside business hours goes to a queue, not a person. For most agents, this is fine — they handle themselves overnight. For anything genuinely urgent, we have escalation paths.
The goal
Reliability isn't about preventing all problems. It's about making sure problems are small, short, and handled before you have to think about them.
Your agent should feel like electricity: you don't think about it until it's not there. And when it's not there, it comes back fast.
That's what we're building toward. We're not there yet on every edge case, but we're honest about the gaps and we close them quickly.
Questions about reliability or support? Contact us → or check our FAQ →