9 Comments
User's avatar
Eddy Bogomolov's avatar

The real reason is almost always the same. It worked in the demo, then quietly rotted in the 20 hours a week nobody wants to spend keeping it alive. Four months is a long run. Did yours die from drift, or from you just not trusting its output anymore?

Mykyta's avatar

Neither, honestly. Mine didn't rot or drift - it just sat there working fine while I never built the habit of opening it. The failure was upstream of trust. I never created a reason to talk to it daily, so it became a thing I knew was running and ignored.

That's the part I think gets missed. We obsess over drift and output quality, but a lot of agents die because they never become part of a routine. No trigger, no ritual, no place in the day. It's not "I stopped trusting it." It's "I forgot it existed."

The 20-hours-a-week problem you're describing is real, but it assumes you're at least showing up. I wasn't even doing that. Which is almost worse, because the thing was perfectly capable.

When yours died, did you actually catch the rot happening, or did you just notice one day you'd quietly stopped reaching for it?

Tam Nguyen's avatar

Haven't tried OpenClaw but have tried ClaudeClaw and Hermes. I love my Hermes. When I find I need to do work with more nuance and processes power I use my ClaudeClaw. If you find yourself with multiple harnesses check out https://github.com/xingkongliang/skills-manager. It allows you share skills more easily across all your agents/CLIs. Installs in Chinese though so you'll need to flip the language :)

Mykyta's avatar

Bookmarking that skills-manager repo, thanks. The cross-harness skill sharing is exactly the pain I keep hitting when I run more than one agent. Chinese install is a fun bonus.

How do you decide when something graduates from Hermes to ClaudeClaw - is it gut feel or do you have a rule for it?

Tam Nguyen's avatar

Side note: the setup changes constantly so rules aren't all that helpful haha

Tam Nguyen's avatar

Most of my crons are with Hermes (Codex Oauth). Managing API costs stress me out haha. Mainly data management, CRM updates, content & web scraping, etc are with Hermes.

Anything related to writing, transcript analysis, I'll go Claude. Most tasks are done through manual invocation right now since I don't really want to manage crons across two stacks. Strategic planning/decisions I may run in both systems to see how they come back.

So, to answer your question, no real rule. Most of data lives in obsidian or Notion so that I can plug and play any harness to achieve the best outcome for the task. Goal is to not get vendor locked if I can help it.

Colleen Avarene's avatar

The Morning-Open Test is the most useful diagnostic I've seen for agent setups, and I think it applies way beyond OpenClaw vs Hermes. Eight hours a week maintaining infrastructure that technically works but nobody wants to touch — that's not an agent, that's a second job you gave yourself for free. The distinction between "does it function" and "do I want to be in this relationship" is the same one we see on the business side with off-the-shelf chatbots vs custom agents. Clients don't churn because the bot broke. They churn because opening it feels like a chore.

The seven migration steps are solid and honest — especially running parallel systems instead of ripping and replacing. That's the advice most people skip because it sounds slow, and then they lose continuity they can't get back. One thing I'd add: sometimes the agent that fails the Morning-Open Test isn't the wrong tool — it's the right tool with the wrong scaffolding. Before you kill it, check whether the problem is the platform or the instructions you gave it when it wakes up. That distinction saved at least one setup I know of.

Mykyta's avatar

The "right tool, wrong scaffolding" point is the one I underweighted, and you're right to flag it.

When my agent failed the morning test, my first instinct was to blame the platform. But honestly half of it was that I'd never sat down and written what it should do the moment it wakes up. It was capable. It was just sitting there waiting for me to tell it something, every single time. No proactive surface. That's a scaffolding problem, not a tool problem.

The hard part is telling the two apart before you've already invested weeks. My rough rule now: if I dread opening it but the outputs are good when I do bother, that's scaffolding. If the outputs themselves are wrong or slow, that's the tool. The first is fixable in an afternoon. The second isn't.

The chatbot churn parallel lands for me. "It broke" is rare. "It's a chore to open" is the silent killer nobody puts in a bug report.

How do you usually catch the chore feeling on the client side before they quietly stop logging in?

Colleen Avarene's avatar

That diagnostic is sharp — "dread the opening but like the outputs" vs "the outputs themselves are wrong." That's a one-question triage that saves weeks of misdiagnosis. Stealing that.

On catching the chore feeling before clients ghost: honestly, the best signal is usage frequency. We watch for the drop-off pattern — first two weeks are high engagement, then it tapers. If someone goes from checking daily to checking twice a week, that's not them being busy. That's the chore creeping in. We reach out at the taper, not after the silence.

The other thing we do is bake the Morning-Open Test into the build itself. The agent's first interaction of the day isn't a status report — it's something the owner actually wants to see. A booking that came in overnight, a client compliment, something the agent handled that the owner would have had to do at 6 AM. If the first thing you see when you open it makes you think "oh nice," you keep opening it. If the first thing you see is a dashboard, you stop.

The scaffolding fix you described — writing what it should do the moment it wakes up — is literally 80% of what we do during voice calibration. Most people build the capability and forget to build the relationship. The agent can do everything but nobody told it how to say good morning.