Back to all posts
·4 min read

Autonomous AI Coding with the Ralph Loop

tech

Autonomous AI Coding with the Ralph Loop

Since Geoffrey Huntley released his work on Ralph and Steve Yegge on gas town, I've been experimenting with an autonomous AI coding loop called "Ralph" that built a functional in-browser Excel clone in about an hour. I put together a ralph-demo repo on GitHub so you can try it yourself. The implementation is surprisingly simple - just a bash script, a seed prompt, and a todo folder.

The Architecture

The core idea is deceptively straightforward. You break down a large project into small, discrete tasks represented as markdown files in a todo/ folder. A bash loop then runs an AI agent repeatedly, where each iteration:

  1. The agent looks for a file ending in .pending.md
  2. It "claims" the task by renaming it to .processing.md
  3. It reads and executes the task to completion
  4. It renames the file to .completed.md with notes about what was done
  5. It commits and pushes changes to git
  6. The loop continues to the next pending task

If the agent can't complete a task, it renames the file back to .pending.md with notes about what went wrong, allowing the next iteration to retry.

The Seed

The process starts with a simple planning phase. You give the AI a high-level description of what you want to build - in this case:

"I need you to plan an application to build. It's going to be like Excel - but in a browser. Have phases where the important features are done in the first phase and so on."

The AI then breaks this down into discrete tasks with dependencies, populating the todo/ folder with .pending.md files that describe each unit of work.

Why This Works

This approach elegantly solves several problems with AI-assisted coding:

Context isolation. Each task is self-contained. The agent doesn't need to hold the entire project in context - just the current task and its requirements. This sidesteps the context window limitations that plague complex coding sessions.

Fault tolerance. If one iteration fails, the task goes back to pending and the loop continues. The system can recover from agent errors, timeouts, or partial completions.

Incremental progress. Each completed task is committed and pushed. You can stop the loop at any time and have working, versioned code. No risk of losing hours of work to a crashed session.

Natural parallelisation potential. While the current implementation is sequential, the architecture naturally supports running multiple agents in parallel, each claiming different pending tasks.

The Implications

What struck me about Ralph is how it shifts the role of the developer. Instead of writing code, you're writing task descriptions and reviewing commits. The skill becomes decomposing problems effectively and writing clear specifications.

This is reminiscent of how senior developers already work - spending more time on architecture and requirements than implementation. Ralph just takes it to its logical conclusion.

The Excel clone built in an hour wasn't production-ready, but it was on the right track with working basic functionality. It could have gone further had I let it burn through more tokens. This suggests we're approaching a world where the cost of software is measured in API credits rather than developer hours.

Trying It Yourself

The ralph-demo repo has everything you need to get started. The approach is simple enough to recreate or adapt. You need:

  1. A seed prompt that asks the AI to decompose your project into task files
  2. A bash loop that runs an AI agent with a prompt telling it to claim, execute, and complete tasks
  3. A todo/ folder structure with .pending.md, .processing.md, and .completed.md conventions

The key insight is that AI agents work better with clear, bounded tasks than open-ended coding sessions. By externalising the task queue to the filesystem and letting the agent work through it iteratively, you get surprisingly robust results.

This feels like an important pattern - not just for AI coding, but for how we'll interact with AI systems generally. Give them structure, let them work incrementally, and design for graceful failure.

What experiments have you run with autonomous AI agents?