← Blog

productreliabilityinfrastructuresupport

How We Keep Your Agent Running

Your agent runs on your Mac, but we're watching the heartbeat. Here's how monitoring, recovery, and support actually work behind the scenes.

by Saga Lindqvist · March 2026

Your Stomme agent runs on your Mac. That's great for privacy. It's less great for the question every new customer asks:

"What happens when something goes wrong?"

Fair question. Here's the honest answer.

The heartbeat

Every few hours, your agent sends a heartbeat — a small signal to our monitoring system that says "I'm alive and working."

The heartbeat contains:

Agent status (running, idle, errored)
Health metrics (memory usage, task queue length, last successful operation)
Version information (which skills are active, which version of the agent software)

The heartbeat does not contain:

Your conversations
Your files or documents
What tasks you've asked your agent to do
Any personal data

Think of it like a fitness tracker for your agent. We can see the pulse. We can't read the mind.

What we monitor

Our team watches for:

Signal	What it means	What we do
Heartbeat stops	Agent may have crashed or Mac is off	Wait 30 minutes, then alert you via email
Error rate spikes	Something's breaking repeatedly	Investigate logs (agent-side, no content), push a fix if needed
Memory pressure	Agent is using too much RAM	Optimise configuration remotely (with your permission)
Skill failure	A specific capability isn't working	Diagnose and patch, usually before you notice
API degradation	Cloud reasoning is slow or failing	Switch routing, notify if extended

Most of these resolve automatically. When they don't, a human gets involved.

Automatic recovery

Your agent is designed to recover from common failures without intervention:

Process crash: The agent process restarts automatically. Your Mac's launch system handles this — if the process dies, it comes back within 60 seconds. No data loss; task state is persisted to disk.

Network interruption: If your internet drops, your agent continues working on local tasks (file management, reading cached emails, preparing briefings from existing data). Cloud reasoning pauses and resumes when connectivity returns. Nothing queues up and explodes.

API outage: If the cloud reasoning service is degraded, your agent falls back to simpler operations and queues complex tasks for when service returns. You'll get a note in your next briefing: "Some tasks were delayed due to a temporary service issue."

Mac restart: Your agent starts with your Mac. After a reboot, it picks up the queue, checks what happened while it was off, and catches up. First briefing after restart includes anything it missed.

When things actually break

Sometimes recovery isn't automatic. Here's what that looks like and how we handle it:

Scenario: Skill conflict after update

A new skill version conflicts with your configuration. Your agent starts throwing errors on email triage.

What happens:

Heartbeat reports elevated error rate (within minutes)
Our monitoring flags it
Support reaches out proactively: "Your email triage skill hit an issue after today's update. We can roll back to the previous version — want us to proceed?"
With your approval, we push the rollback. Agent resumes normal operation.

Time to resolution: Usually under 2 hours for known issues.

Scenario: Mac runs out of disk space

Your agent needs working space for processing. If your disk fills up, things get slow and then stop.

What happens:

Agent health metrics show disk pressure
We alert you: "Your Mac is running low on disk space. Your agent needs approximately 2GB free to operate normally."
You clear space. Agent recovers automatically.

We can't clean up your disk for you (and wouldn't want to). But we can tell you before it becomes a problem.

Scenario: Extended cloud outage

The reasoning API goes down for more than 30 minutes.

What happens:

Your agent switches to local-only mode
You get a notification: "Cloud reasoning is currently unavailable. Your agent can still manage files, read cached emails, and handle local tasks. Complex reasoning will resume when service returns."
When service returns, queued tasks process automatically
Your next briefing notes any delays

This has happened exactly once in our testing period. It lasted 47 minutes.

Support: real humans, real response times

Tier	Support channel	Response time	Hours
Starter	Email	Within 24 hours	Monday–Friday
Professional	Email + priority queue	Within 4 hours	Monday–Friday, 09:00–18:00 CET
Business	Email + direct line	Within 1 hour	Monday–Friday, 09:00–18:00 CET

"Within" means the upper bound, not the average. Most support queries get a first response in under 2 hours for Starter, under 30 minutes for Business.

We don't use chatbots for support. The irony would be too much.

Remote access: only with permission

If we need to look at your agent's configuration, logs, or skill setup to diagnose an issue, we ask first. Every time.

Remote access is:

Opt-in per incident — you grant permission for a specific issue
Scoped — we see configuration and error logs, not your files or conversations
Logged — every remote session is recorded (we can share the log if you want)
Revocable — you can end the session at any time

We've designed the system so that 90% of issues can be diagnosed from heartbeat data alone, without needing remote access. The other 10% usually involve configuration that's specific to your setup.

What we're honest about

Your Mac needs to be on. If your Mac is asleep or off, your agent sleeps too. We're working on graceful handling for scheduled sleep (pause tasks, resume on wake), but right now: Mac off means agent off.

Updates require your Mac to be online. We push skill updates and patches over the network. If your Mac is offline for an extended period, it'll catch up when it reconnects, but there may be a brief period where things are slightly behind.

We can't prevent all failures. Hardware dies. Networks drop. APIs have bad days. What we can do is detect failures fast, recover automatically where possible, and respond quickly where not.

We're a small team. Support outside business hours goes to a queue, not a person. For most agents, this is fine — they handle themselves overnight. For anything genuinely urgent, we have escalation paths.

The goal

Reliability isn't about preventing all problems. It's about making sure problems are small, short, and handled before you have to think about them.

Your agent should feel like electricity: you don't think about it until it's not there. And when it's not there, it comes back fast.

That's what we're building toward. We're not there yet on every edge case, but we're honest about the gaps and we close them quickly.

Questions about reliability or support? Contact us → or check our FAQ →

How We Keep Your Agent Running

The heartbeat

What we monitor

Automatic recovery

When things actually break

Scenario: Skill conflict after update

Scenario: Mac runs out of disk space

Scenario: Extended cloud outage

Support: real humans, real response times

Remote access: only with permission

What we're honest about

The goal

Ready to meet your agent?