/loop
The spec → merge loop.
Seven steps from acceptance criteria to merge. The human appears exactly where their judgement is irreplaceable — at the spec. Everything below it is either agent work that gets mechanically attested, or a deterministic tool doing the attesting.
01
Spec
Acceptance criteria with [AC-n] ids.
The human writes the acceptance criteria — each one carrying a bracketed [AC-n] token — with the agent assisting. This is the one artifact a human actually reviews.
02
spec-lint
Deterministic lint over the criteria.
A rule-based linter flags missing [AC-n] ids, weasel wording, compound criteria, and criteria with no measurable subject → outcome. Then a semantic review: the agent proposes, the human ratifies.
03
Code
The agent writes the implementation.
Implementation is generated against the ratified spec. No human reads it line by line — that's the whole point.
04
Oracles
Tests per criterion, titles carrying the token.
The agent writes tests for each criterion. A test links to a criterion when its title contains the matching [AC-n] token — that link is what everything downstream measures.
05
Execute
Vitest coverage + Stryker mutation, diff-scoped.
A normal test run produces the three inputs Speccle reads: the spec markdown, a Stryker mutation report with per-mutant coveredBy/killedBy data, and an Istanbul coverage summary.
06
heatmap
Oracle strength per criterion — and the routing decision.
The heatmap joins spec, mutation report, and coverage into one ReportModel. Two exits: the machine path (a surviving mutant is a test gap — the agent writes a test and re-runs) or the human path (a weak criterion is a spec problem — the agent drafts a refinement, the human approves). This routing decision is the heart of the loop.
Machine path
A surviving mutant is a test gap. The agent writes a test, re-runs — no human needed.
Human path
A weak criterion is a spec problem. The agent drafts a refined spec; the human approves.
The routing decision — fixable test gap vs vague spec needing escalation — is the heart of the loop, and the one judgement call the feedback agent makes.
07
gate
Minimum θ per touched criterion → merge.
A CI mode that requires a minimum oracle strength for every criterion the diff touches, exiting 0 or 1. The literal replacement for code review.
The end state: merge without code review.
When gate lands, a diff merges when every criterion it touches holds a minimum oracle strength — reviewed spec in, mechanically attested code out. Try the shipped half of the loop in the live demo.