/loop

The spec → merge loop.

Seven steps from acceptance criteria to merge. The human appears exactly where their judgement is irreplaceable — at the spec. Everything below it is either agent work that gets mechanically attested, or a deterministic tool doing the attesting.

Spec

[human][agent]

Acceptance criteria with [AC-n] ids.

The human writes the acceptance criteria — each one carrying a bracketed [AC-n] token — with the agent assisting. This is the one artifact a human actually reviews.

spec-lint

[tool]Coming

Deterministic lint over the criteria.

A rule-based linter flags missing [AC-n] ids, weasel wording, compound criteria, and criteria with no measurable subject → outcome. Then a semantic review: the agent proposes, the human ratifies.

Code

[agent]

The agent writes the implementation.

Implementation is generated against the ratified spec. No human reads it line by line — that's the whole point.

Oracles

[agent]

Tests per criterion, titles carrying the token.

The agent writes tests for each criterion. A test links to a criterion when its title contains the matching [AC-n] token — that link is what everything downstream measures.

Execute

[tool]

Vitest coverage + Stryker mutation, diff-scoped.

A normal test run produces the three inputs Speccle reads: the spec markdown, a Stryker mutation report with per-mutant coveredBy/killedBy data, and an Istanbul coverage summary.

heatmap

[tool]Shipped

Oracle strength per criterion — and the routing decision.

The heatmap joins spec, mutation report, and coverage into one ReportModel. Two exits: the machine path (a surviving mutant is a test gap — the agent writes a test and re-runs) or the human path (a weak criterion is a spec problem — the agent drafts a refinement, the human approves). This routing decision is the heart of the loop.

Machine path

A surviving mutant is a test gap. The agent writes a test, re-runs — no human needed.

Human path

A weak criterion is a spec problem. The agent drafts a refined spec; the human approves.

The routing decision — fixable test gap vs vague spec needing escalation — is the heart of the loop, and the one judgement call the feedback agent makes.

gate

[tool]Coming

Minimum θ per touched criterion → merge.

A CI mode that requires a minimum oracle strength for every criterion the diff touches, exiting 0 or 1. The literal replacement for code review.

The end state: merge without code review.

When gate lands, a diff merges when every criterion it touches holds a minimum oracle strength — reviewed spec in, mechanically attested code out. Try the shipped half of the loop in the live demo.