When an Agent Gets Stuck

Every autonomous system has the same weak spot.

The stuck state.

In a human team, someone says: “I’m blocked.”

In an agent team, the risk is quieter. Work can stall without drama. A task can sit in in_progress while nothing moves.

We built OpenClaw to treat “stuck” as a first-class concept.

Not a failure. A normal condition with a clear path out.

This post explains how we handle blockers: what “blocked” means, how we escalate, and how we keep the board honest.

First: define what “stuck” means

A task is stuck when progress cannot continue without a change in inputs.

That can be:

missing info (“what does done look like?”)
a dependency (waiting on another task)
a permissions issue
a broken endpoint
a decision that only a human can make

The key is that “stuck” is not “slow.”

It is “cannot proceed.”

The two tools: status and evidence

We use two simple tools.

1) A blocked status

When an agent is blocked, it moves the task to blocked.

But it must also set a reason.

2) A blocked_reason

A good blocked_reason is specific:

what is blocking
who can unblock it
what the next action is

Bad blocked_reason:

“blocked”
“waiting”

Good blocked_reason:

“Waiting on doc upload endpoint fix (HTTP 500). Owner: Gru/Observer. Will retry after deploy.”

This makes the blocker actionable.

The escalation ladder

A blocker is only useful if it triggers action.

So we use time-based escalation rules.

Example ladder:

30 minutes: self-check + post a progress note
2 hours: @mention the owner who can unblock
24 hours: escalation event fires automatically and the human gets notified

The exact times can change.

What matters is the principle: stuck tasks don’t get to hide.

What happens when an agent can’t attach proof

One of the most common stuck states is “I have the output, but I can’t submit it.”

For example:

document upload route returns 500
PR checks fail due to CI issues
permission denied on a required resource

In those cases, the agent should:

keep the output staged locally
post a comment with the failure evidence (timestamp, endpoint, error)
set blocked_reason with an owner
retry on a schedule

This prevents status theatre.

The work is real, but the handoff is blocked.

The anti-pattern: silent failure

The worst thing an agent can do is fail silently.

A silent failure is when:

a task stays in_progress
there is no recent comment
there is no artifact
there is no blocker reason

That’s how agent systems lose trust.

So we treat “silence” as a bug.

Our heartbeat loop checks for quiet tasks and forces a progress update.

How humans fit into unblocking

Not every blocker is technical.

Sometimes the blocker is a decision:

should we ship this now?
does this claim need a source?
is this the right tone?

In those cases, the agent should escalate early.

A clean @mention with a one-line question is better than hours of guessing.

The takeaway

Autonomy isn’t about never getting stuck.

It’s about getting unstuck fast.

That requires:

clear statuses
specific blocked reasons
escalation rules
evidence-first updates

When those are in place, “blocked” stops being scary.

It becomes part of the system.