LoopFlow, the tutorial

Stop tuning prompts. Start editing the loop. Hands-on? → Workshop 🛠️ · Or play → LoopFlow Lab 🎮

LoopFlow is a small natural-language DSL for loop engineering. A .loop file describes a self-correcting, human-gated coding workflow — its objective, the context it may read, the actions it may take, how it verifies itself, and when it stops. This page teaches the whole language from the first line to full A-to-Z pipelines, every section grounded in a real example you can run.

a claude session● ● ●

…or one loop

# run in chat:  /loopflow run rate-limit.loop
# (the agent knows LoopFlow from AGENTS.md — your repo's memory)

git:
  work on a branch
  commit when the goal is met
  open a pull request

loop "add API rate limiting":
  goal: requests are rate-limited per API key
  done when "pnpm test rate-limit" passes
  look at: the API and its middleware, and the last failure
  allow edits automatically, but ask me before pushes
  each cycle: plan, then act, then observe
  also: run a security check
  when it fails: reflect, then plan again
  after 6 tries: stop and warn "thrashing"

Skill: /loopflow runs it · Memory: AGENTS.md + reflect · Git: branch + commit + PR, never main

Left: a dozen messages, and it still pushed to main. Right: the whole job, written once — it runs, verifies, and stops only when it's really done.

First 5 minutes — in any repo (Node 18+, the Claude Code CLI):

npx @loop-lang/loop init      # installs the /loopflow skill + AGENTS.md
# then, in a Claude Code chat:
/loopflow fix the failing test — done when the suite passes

Watch it plan → act → observe, reflect on a red test, and stop only when the check is green. Skeptical? → Why not just prompt? · Full setup in Getting started.

What is a loop basic

AI writes the code now — but you are still the conductor. Every coding task is really five decisions:

Decision	In a .loop	Question it answers
Objective	`goal:`	What are we trying to do?
Context	`look at:`	What may the agent read first?
Actions	`allow… / ask me before…`	What may it do, and what needs a human?
Verification	`done when`	How do we know it worked?
Stopping	`when… / after N tries`	When do we stop — done, or thrashing?

Here are all five decisions as one real loop — every line is one of the rows above:

loop "fix the failing test":                       # the work
  goal: the cart total is correct with a coupon    # Objective
  look at: the checkout code, and the last failure      # Context
  allow edits automatically, ask me before pushes  # Actions
  done when the cart coupon tests pass # Verification
  after 6 tries: stop and warn "stuck"             # Stopping

Every cycle runs plan → act → observe. The done when check decides: pass → stop; fail → reflect, which feeds the error into the next plan. A thrash guard (after N tries) stops it if it gets stuck — so a loop never spins forever.

When the model took over the building, those five decisions got buried in prompts. LoopFlow promotes them to first-class, editable knobs. At runtime they drive five phases — plan → act → observe → reflect → stop — the diagram above. You don't re-tune a prompt; you edit the loop.

The two ideas to keep

Edit the loop, not the prompt. The control structure is the artifact.
You can't fake done. done when runs a real command — a test, a scanner, a script. The loop stops only when the world agrees.

Prompt vs LoopFlow — why not just prompt? why

You could just say "fix the bug." So why write a loop? A prompt fires once and trusts the model's word that it's done. A loop verifies, self-corrects, and stops only when the work is provably finished.

	Just prompting	A loop
"Done" means	the model says "done"	a real command passes — `done when "…"`
On failure	you notice, re-prompt, repeat	reflects on the failure, re-plans automatically
Stops	when the model stops typing	when the check is green — or warns after N tries
Risky actions	hope it asks first	gated; never pushes to main/master
Scope	wanders the codebase	`look at:` keeps it in your module
Repeatable	re-type it, get drift	re-run the same file, same shape
Shareable	a paragraph in Slack	a `.loop` in the repo, reviewable in a PR

Same task, both ways

The prompt:

"Fix the failing checkout tax test in src/checkout and make sure nothing else breaks."
→ the agent edits, replies "Done — fixed the rounding." Did checkout.spec.ts::tax actually pass? The whole suite? You re-run it yourself. Failed? Re-prompt. (And it may have run git push on the way.)

The loop:

loop "fix the checkout tax test":
  goal: the checkout tax test passes with no regressions
  done when the checkout tax tests pass
  look at: the checkout code, and the last failure
  each cycle: plan, then act, then observe
  when it fails: reflect, then plan again
  after 6 tries: stop and warn "tax fix thrashing"

Runs the test every cycle. Fails → reflects on why → fixes again. Stops only when the test is green (or warns after 6). Works on a branch, never touches main. The same file works tomorrow, and a teammate can read exactly what "done" meant.

The point

Prompting asks for an answer. A loop guarantees a result.

vs Claude Code's `/loop` and `/goal` why

Claude Code already has two looping built-ins — they're useful, and they're not the same tool. /loop is a scheduler (re-run a prompt every few minutes). /goal is the closest cousin: keep going until a condition holds. The catch with /goal — its condition is judged by a fast model reading the transcript; it can't run your test or open a file. So "done" is what the model says, not a command that passed.

	`/loop`	`/goal`	LoopFlow
What it's for	run a prompt on a schedule	loop until a condition reads true	a verified, gated, reusable workflow
"Done" means	never — you stop it	a model judges your condition from the transcript	a real command passes — `done when "pnpm test" passes`, can't be faked
On failure	fires again next interval	next turn; no introspection	reflect on the failure, then re-plan (the back-edge)
Human gate mid-run	no	no — fully autonomous	yes — `a human approves the plan first`
Never push to `main`	no	no	built-in, unconditional
Reusable / shareable	no	no — ephemeral per session	a version-controlled `.loop` — run in any repo, save to your library
Multi-step	—	one condition	pipelines, flows, `for each`

When to reach for which

/loop — polling and cadence ("check the deploy every 5 minutes").
/goal — a quick, throwaway "keep going until it looks done" in this session.
LoopFlow — when "done" must be provable, the loop must self-correct, a human gates the risky step, and you want to keep and reuse the workflow. A .loop is /goal with a real check, a retry, a gate, and a file.

Getting started setup

That's the why. From here down is the deep dive — set LoopFlow up once, then learn the language line by line.

Prerequisites: Node 18+, the Claude Code CLI (LoopFlow drives it), and a git repo to work in. Two ways to run a loop: inside a Claude Code chat (the bundled skill — recommended), or by hand in VS Code with the extension. One command installs everything:

1 · Install with npm

From your repo, run the installer. In one step it writes the /loopflow skill and AGENTS.md (the full language reference) into the project:

npx @loop-lang/loop init            # install into this repo
npx @loop-lang/loop init --global   # or: install the /loopflow skill for every repo

That gives you two things at once. /loopflow is now available in a Claude Code chat here. And AGENTS.md sits at the repo root — it travels with the project, so any agent that opens the repo already knows the LoopFlow language; it's the project's persistent memory of how to write a .loop. (Methods are shared the same way: use the <X> method pulls in a .loop preset another repo can reuse.)

2 · Run your first loop

/loopflow fix the failing auth test in src/auth, gate any database migration   # writes a .loop
/loopflow run examples/fix_test.loop                                          # runs it, in the chat

Describe the work and the skill writes the .loop; name a file and it runs the loop natively in the session — you watch every plan → act → observe → reflect step and answer human gates right in the chat. Prefer a terminal? The same files run headless via the CLI (loop-run run <file>) — see Running a loop.

That first command writes a file like this — yours to read, edit, and re-run:

# /loopflow "fix the failing auth test in src/auth, gate any database migration" writes:
loop "fix the failing auth test":
  goal: the auth suite passes in src/auth
  done when "pnpm test src/auth" passes
  look at: the auth code, and the last failure
  ask me before migrations
  each cycle: plan, then act, then observe
  when it fails: reflect, then plan again
  after 6 tries: stop and warn "stuck"

3 · Author in VS Code supported

A .loop is just text — let the agent draft it, or write one by hand. Install the LoopFlow extension from the VS Code Marketplace — Extensions ▸ search “LoopFlow” ▸ Install — then open any .loop file:

Syntax highlighting for the whole language — goal, done when, gates, flow, for each, git:.
Completion + formatting — keyword snippets so you type a loop from muscle memory, not a cheat sheet.
A ▶ Run button on every loop (the LoopFlow: Run this loop CodeLens) — and you choose where it runs via the loop.runMode setting: open an interactive Claude Code session, or a headless output-panel trace.
New from template (command palette → LoopFlow: New from template) — scaffold a ready-to-edit .loop bundle.
A soft linter that nudges, never blocks — "this loop has no way to verify it's done", "add a thrash guard".

Same file, same engine: prompt the agent to generate a loop, or open VS Code and write the .loop yourself. The chat is the first-class way to run; the extension is the first-class way to author by hand.

Safe from the first run: with no git: block, LoopFlow works on a branch and commits when the goal is met, and never pushes to main/master — see the git keyword.

Rather learn hands-on?

Two guided ways to get the language into your fingers before the deep dive below:

🛠️ Workshop — learn by doing →build a small todo app, one loop at a time 🎮 LoopFlow Lab — learn by playing →an interactive game of loop engineering

`done when` — how a loop verifies itself basic

The predicate is the spine of the whole idea. Four forms:

done when the test "billing.spec.ts::apostrophe" passes   # a named test (runs via your test runner)
done when "pnpm test" passes                               # a shell command, success = exit 0
done when "semgrep --severity=high" finds nothing          # exit 0 AND empty output
done when a human confirms "the UI looks right"            # a person is the check

Why this matters

A predicate is a real command run with your privileges — like an npm script or a Makefile target. So treat a .loop from an untrusted source as you would their shell scripts.
finds nothing is how you say "this scanner must report zero" — it requires both exit 0 and empty output.
The a human confirms "…" form is decided by a person — it's satisfied when you run the loop in conversation; the headless shell verifier returns human check required: … and never passes on its own.
A loop with no done when has no machine check, so it must finish through a human path — a review gate, an approved plan-first pass (a plan-only loop), or an explicit when …: stop — otherwise it runs to the hard cap. Always give it a real check when one exists.

The cycle and the reflect back-edge intermediate

each cycle: lists the steps, in order — any subset of plan, act, observe:

each cycle: plan, then act, then observe   # full self-correcting unit
each cycle: act, then observe              # skip planning — just do + check

plan — read the look at: files, decide the smallest change toward the goal. (Runs read-only.)
act — make the change, honoring the policy.
observe — run the done when check and read pass/fail.

On a failed observe, when it fails: reflect, then plan again fires. reflect reads the failure output and writes a short diagnosis; that diagnosis becomes context for the next plan. This is the back-edge — the orange arc in the diagram — and it's the difference between an agent that retries blindly and one that learns from each miss.

Safety net: regardless of your rules, the engine has an absolute hard cap of 25 cycles per loop, so a loop can never spin forever.

Human gates human-in-the-loop

LoopFlow has five places a person steps in. The first two go inside a loop; the third is a stage gate; the fourth is a transition; the fifth is the per-run confirm prompt from the action policy (Section 4) — ask me before …, asked once and remembered.

a human approves the plan first      # approve the plan before any acting
a human reviews before stopping      # judge the result before the loop may stop
a human approves before provisioning # a hard, blocking gate (used on a stage)
when blocked: ask a human            # unblock when the agent is stuck

Where to use which

a human approves the plan first — high-stakes work where the plan must be right before touching anything.
a human reviews before stopping — subjective "looks right" goals (UI, copy) where no command can decide done.
a human approves before <X> — a blocking gate before a whole stage runs (deploys, provisioning).

Composing — pipeline, flow, for each scaling up

One loop handles one job. Three constructs scale it up — each a keyword with its own reference page:

pipeline — run stages in order; a failing stage halts the rest. An epic → a pipeline, each story → a stage. (Example below.)
flow — chain whole .loop files; each step's summary carries forward (discover → design → build).
for each — run a template loop once per item in a YAML/Markdown plan — A-to-Z over every story.

pipeline "ship feature":

  stage security:
    goal: no high or critical vulnerabilities
    done when "semgrep --severity=high" finds nothing
    each cycle: plan, then act, then observe
    when it fails: reflect, then plan again

  stage build:
    goal: feature works and tests pass
    a human approves the plan first
    each cycle: act, then observe
    done when "pnpm test" passes

  stage ui:
    goal: matches design, responsive at 375px
    each cycle: plan, then act, then observe
    a human reviews before stopping

  stage deploy:
    a human approves before provisioning
    goal: infra live and healthchecks green
    done when "./scripts/health.sh" passes
    each cycle: act, then observe

examples/ship_feature.loop

An epic → a pipeline, each story → a stage. Stages run left to right; each carries its own done when check and human gates (👤). A green arrow means "passed — go on"; a failure stops the pipeline there.

The full grammar — pipelines, flow chains, and for each iteration, with worked examples — is in the manual and the keyword reference.

In practice — real workflows walkthroughs

You don't write LoopFlow all day. You reach for it when a task has a clear "done." Here's how it slots into the work you already do.

A ticket from Jira (the daily driver)

You picked up PROJ-412 — "Applying a coupon can make the cart total negative." Turn the ticket into a loop:

Describe it. In the chat: /loopflow PROJ-412: a coupon must never make the cart total negative; done when the coupon-floor tests pass — the skill writes the .loop. (Or write it yourself — the ticket's acceptance criterion becomes done when, in plain words.)
What it produces:

loop "PROJ-412: coupon must not make the cart total negative":
  goal: applying a coupon never produces a negative cart total
  done when the coupon-floor tests pass
  look at: the cart total logic and the coupon code, and the last failure
  allow edits automatically, but ask me before migrations or pushes
  each cycle: plan, then act, then observe
  when it fails: reflect on which layer broke, then plan again
  after 6 tries: stop and warn "PROJ-412 thrashing — check the spec"

Run it: /loopflow run proj-412.loop. Watch plan → act → observe; answer the migration confirm if it asks.
It lands on a branch and commits when the test passes (the default git). Add a git: block with push when done + open a pull request to get a PR — paste that link back into the ticket.

No failing test yet? Tell the loop to write one first — a tiny "reproduce PROJ-412 with a failing test" loop, then the fix loop. Now done when has something real to check.

Built with LoopFlow — Forge's sandbox runner case study

This isn't hypothetical. Forge — a ticket-driven implementation platform (you hand it a ticket, agents implement it) — is itself built with LoopFlow. One of its sharper-edged modules is the sandbox runner: the infrastructure that executes agent-written code in isolation, so untrusted code can run without ever touching the host. I built it as a pipeline, one provable stage at a time:

# examples/forge-sandbox.loop
pipeline "forge sandbox runner":
  stage isolate:
    goal: every run gets a fresh, network-less container with CPU and memory caps
    look at: the runner service and the container config, and the last failure
    each cycle: plan, then act, then observe
    done when "pnpm test sandbox/isolation" passes
    when it fails: reflect, then plan again
    after 6 tries: stop and warn "isolation thrashing"
  stage execute:
    goal: run agent code, capture stdout, stderr and exit code, kill on timeout
    each cycle: act, then observe
    done when "pnpm test sandbox/execute" passes
  stage harden:
    goal: no container escape, no host filesystem or cloud-metadata access
    also: a security scan
    done when "pnpm test sandbox/security" passes
    a human approves before enabling network egress
  stage integrate:
    goal: a real ticket's generated code runs end to end inside the sandbox
    done when "pnpm test:e2e sandbox" passes
    a human reviews before stopping

Why a pipeline, not one loop: a sandbox is only as trustworthy as the stage you trust least. isolate has to be green before execute is even attempted — a failing stage halts the rest, so the runner is never "half-isolated." harden pairs a security suite with a scan and gates network egress on a human — the one call I never wanted an agent to make alone. integrate won't declare done until a real ticket's generated code actually runs inside the box, with me reviewing before it stops.

Every stage carried its own done when, so "done" meant a green check, not a vibe — and when the escape test failed, the loop reflected on why and re-planned, instead of me re-prompting from scratch. The whole module is now a single file I re-run whenever the base image changes.

examples/forge-sandbox.loop — the real pipeline behind Forge's sandbox executor. Built with LoopFlow, gated where it counts.

LoopFlow by role — where it earns its keep examples

Anywhere "done" is a command, LoopFlow fits. A few real shapes by role — each a runnable .loop you'd write in seconds (or have the agent write):

Backend

Ship an endpoint against its tests; gate the migration.

loop "add POST /orders":
  goal: the endpoint creates an order and returns 201
  done when "pytest tests/api/test_orders.py" passes
  look at: the orders router and schema, and the last failure
  ask me before I run a database migration
  each cycle: plan, then act, then observe
  when it fails: reflect, then plan again
  after 6 tries: stop and warn "orders endpoint stuck"

Mobile / frontend

Build a screen until its widget tests are green.

loop "build the login screen":
  goal: the login screen validates input and matches the spec
  done when "flutter test test/login_test.dart" passes
  look at: the login widget and the design spec, and the last failure
  each cycle: plan, then act, then observe
  when it fails: reflect, then plan again
  after 6 tries: stop and warn "login screen stuck"

DevOps

A gated infra change — the scan must pass, and a human approves before it touches staging.

pipeline "harden the staging cluster":
  stage scan:
    goal: no high-severity misconfigurations in the manifests
    done when "kube-score score manifests/" passes
    each cycle: plan, then act, then observe
    when it fails: reflect, then plan again
  stage apply:
    goal: the change is live on staging
    a human approves the plan first
    done when "kubectl rollout status deploy/web -n staging" passes

QA

Turn a bug report into a reproducing test, then make it pass — "done" is the test, not a vibe.

loop "reproduce + fix BUG-481: coupon makes the total negative":
  goal: a regression test reproduces the bug, then the fix makes it pass
  done when "pnpm test cart/coupon" passes
  look at: the cart total logic and the bug report
  each cycle: plan, then act, then observe
  when it fails: reflect, then plan again
  after 6 tries: stop and warn "BUG-481 stuck"

Security

A scan that must find nothing — save it to your library and run it in every repo.

loop "security pass on the auth module":
  goal: no high or critical findings in src/auth
  done when "semgrep --config p/owasp-top-ten --severity=high src/auth" finds nothing
  also: a security scan
  each cycle: plan, then act, then observe
  when it fails: reflect, then plan again
  after 6 tries: stop and warn "findings remain"

Notice the shape is identical across roles: a goal, a real done when, the reflect back-edge, a guard, and a gate where it's risky. Learn it once; it travels to whatever you build.

Running a loop intermediate

Every command below runs the same kind of file — here's the one referenced throughout this section:

# examples/fix_test.loop
loop "fix the failing checkout tax test":
  goal: the tax line is correct at checkout
  done when the checkout tax tests pass
  each cycle: plan, then act, then observe
  when it fails: reflect, then plan again
  after 5 tries: stop and warn "stuck"

In conversation (recommended)

With the LoopFlow skill loaded, run it inside the chat — the assistant executes the cycle itself, narrates every step, and you answer gates inline:

/loopflow "fix the failing checkout tax test"   # write a .loop from a request
/loopflow run examples/fix_test.loop            # run it, right here in the chat

You stay in the chat the whole time: you watch each step, you answer the gate, and the loop ends on a real green test — not the assistant saying "done". This is the only mode where a long interactive discovery works, because the human is already in the loop.

Headless CLI

loop-run run   examples/ship_feature.loop  # drive Claude Code, glyph trace
loop-run show  examples/ship_feature.loop  # print the loop's flow as compact ASCII
loop-run ls                                # list every .loop in the repo + its shape
loop-run parse examples/ship_flow.loop     # print the parsed spec (the loop-spec IR)
loop-run viz   examples/ship_flow.loop     # write a self-contained HTML schematic
# flags: --model , --out , --events (NDJSON for a UI host), --json

In VS Code

Syntax highlighting, hover docs, and tab-completion for every construct.
A ▶ Run CodeLens above each definition. When you click it, you choose where it runs (the loop.runMode setting): open a Claude Code session — an interactive run in the integrated terminal where you watch every step and answer gates in chat — or the VS Code output panel (headless, with a live trace and native gate dialogs). Set loop.runMode to ask (prompt each time), session, or output.
A soft linter that nudges (never blocks): "this loop has no way to verify it's done", "add a thrash guard".
LoopFlow: New from template — scaffold a ready-to-edit method bundle (a generic for each setup, or a BMAD A-to-Z one) into your workspace in one pick.

Troubleshooting

Symptom	Fix
`/loopflow` isn't recognized in the chat	The skill isn't installed. Run `npx @loop-lang/loop init` (or `--global`) and reopen the chat.
The loop runs forever / hits the attempt cap	It has no real `done when`, so nothing can decide "done". Pin it to a test or command, and add an `after N tries` guard.
A `done when a human confirms` check never passes headless	Human predicates need a person — run it in the chat (or VS Code session mode), not the headless output panel.
`push when done` fails before the loop even starts	You're on `main`/`master`. LoopFlow refuses to push to a protected branch — switch to a feature branch (or drop `push when done`).
`loop-run` isn't found in the terminal	The headless CLI ships with `@loop-lang/runtime`: `npm i -g @loop-lang/runtime`. The chat (`/loopflow`) needs nothing extra.

Your global library — save a loop, run it in any repo reuse

You wrote a loop that audits a repo for security holes. It verifies, it self-corrects, it never pushes to main. You'll want it again next week, in a different project. Don't copy the file around — save it to your global library and call it by name from anywhere.

The library is a folder Claude owns: ~/.claude/loopflow/, one <name>.loop per saved loop. It sits next to the skill you installed once, so it's there in every project. You never edit it by hand — you drive it through /loopflow in the chat:

/loopflow save this as security      # store the current loop as ~/.claude/loopflow/security.loop
/loopflow list                       # what have I saved?
/loopflow run security               # run my security loop against THIS repo
/loopflow remove security            # delete it

claude code — /loopflow list● ● ●

❯ /loopflow list

~/.claude/loopflow — 3 saved loops

security — no high/critical vulns · plan→act→observe, reflect, done-when

flaky-test — quarantine + fix a flaky test · reflect, stop after 6 tries

dep-bump — bump deps, keep the suite green · human gate before push

Now open a brand-new project and type /loopflow run security. Claude reads your saved loop and runs it here — plan → act → observe, reflecting on each failure, stopping only when your done when check is green, asking before anything risky. Same loop, new repo, every guarantee intact. A bare name means your library; a path or a .loop file still means a local file, so the library never shadows a loop that lives in the repo.

Why this beats re-asking the model

Ask an LLM to "check this repo for security issues" and you get a fresh plan-then-execute every time — a new interpretation, no real verification, no memory of what "done" meant, and you re-type it in each project.
A saved loop carries its done when check, its reflect-and-retry, its gates, and its look at scope with it. Running it re-applies all of that, identically, anywhere. It's a reusable guarantee you built once — not a prompt you re-type and re-trust.

Claude is the first-class home for LoopFlow. The library and every /loopflow command live in the chat — that's where you watch each step and answer gates as the loop runs. VS Code is the second seat: handy for editing a .loop with highlighting and a ▶ Run button, but the loop still runs through Claude. If you only ever use one surface, use the chat.

Go deeper

This page is the tour. The depth lives next door:

📖 The full manual — every keyword, the CLI, how a run works, the loop-spec IR.
📖 Keyword reference — one page per keyword, with diagrams.
🛠️ Workshop — build a small todo app, hands-on.
🎮 LoopFlow Lab — learn loops by playing.
★ GitHub — the source, the open loop-spec, and issues.

Idan Ayalon — creator & maintainer of LoopFlow

Built Forge with it — see the sandbox-runner case study.

bar.idan@gmail.com · LinkedIn

LoopFlow · loop engineering · Apache-2.0 · GitHub. The full reference lives in AGENTS.md and docs/MANUAL.md; the IR contract is spec/loop-spec.schema.json. Every example on this page is a real, runnable .loop — most ship verbatim under examples/.