Ralph Loops and Multi-Week AI Coding

Ralph loops (credit to Geoffrey Huntley) have made it practical for AI agents to work on software projects that take weeks, not minutes. Three weeks+ is now happening. This is just the beginning.

The outcome

Outcomes first: basically you can say "build me a new operating system" and then set it off for weeks and it comes back with a solution. There's a bit more to it - but not much, basically if you create the right initial prompt you can let it go on its own.

How it works

You get an agent to write a list of tasks written in plain language e.g. write me a list of tasks to build a new Slack clone but it only uses voice.
The agent takes the next unfinished task, completes it, and marks it done.
After each task, it saves the changes and leaves a short note on what it did.
If it can’t finish a task, it records why and retries later.
The loop repeats until the list is done.

You can actually have two loops: one that plans more tasks and one that executes those tasks.

You can also start a number of agents to all work on separate tasks. This works, but coordination and conflicts becomes a problem.

Do agents now need their own standup?

Why it works

Small tasks fit. One clear outcome is easier to do well for an LLM because small tasks fit in the context window.
The next task starts from “here”. Each step only needs to know where the project is up to, not the entire history. This is important because LLMs can only fit a certain amount of text into their "context window".
Progress is saved. The project moves forward in small steps, so a failure doesn’t wipe out days of work.
It can recover. When something goes wrong, the next run can retry with better instructions.
People stay in control. Or do they? You can have another Ralph loop that creates new tasks

Gas town

If you want to read some extreme uses of these loops - go read about Steve Yegges gas town, where you run a factory of these AI agents collaborating.

What it could lead to

As this gets better, the limiting factor won’t be typing code. It will be choosing what to build, breaking it down well, and judging quality.

That could mean:

Faster shipping for smaller teams.
More work happening in the background: fixes, cleanup, and small improvements on autopilot.
Costs shifting toward running these systems and reviewing their output.
A bigger need for strong quality gates, because speed can ship mistakes faster too. Think - testing bot, securiy bot.