In the wild.
Real captured agent sessions — lifted line-for-line, nothing paraphrased. They show the thing the homepage names: catching slop is the side effect. What slop-mop really does is steer the work, turning CI and scattered review threads into a sequenced path and handing the agent its next step. No single dramatic save. Just the same loop, ridden cleanly, thousands of times — and, at the end, the one time it got it wrong.
1 · The pile becomes a queue
Ten review comments, scattered across logic, tests, and docs. sm buff status collapses
them into one ranked, categorized batch with a single next step — so the agent works a sequence
instead of surfing GitHub.
$ sm buff status 🪣 sm buff status - CI Status Check 🔀 PR: #2 Buff status blocked: CI checks are clean, but unresolved PR review threads remain. PR #2: 10 unresolved comment(s) By category: • 🐛 Logic/Correctness: 7 • 🧪 Testing: 1 • 📚 Documentation: 1 • 💭 General: 1 Next step: run 'sm buff inspect' to take the next review batch.
The power isn’t the catching — it’s that the work arrives pre-sorted, every cycle.
2 · It hands over the scaffolding
slop-mop doesn’t just list the threads — it writes a command pack: a scenario menu and a
templated sm buff resolve for each thread, ranked by impact, waiting to be filled in.
# SCENARIO MENU — choose one per thread after investigating # fixed_in_code — Code addresses the feedback. Cite the commit. # invalid_with_explanation — Feedback is incorrect. Explain with evidence. # no_longer_applicable — Code changed since comment. Note what changed. # out_of_scope_ticketed — Valid but not this PR. File issue, link it. # needs_human_feedback — Need reviewer input. Uses --no-resolve. # ── [1] PRRT_kwDOSJOA585-2xyL ── # Category: 🐛 Logic/Correctness (impact=95) # Location: client/src/features/history/HistoryTimelineView.tsx:170 # >> Investigate this thread, choose a scenario, replace <SCENARIO> and <YOUR_EVIDENCE>: sm buff resolve 2 PRRT_kwDOSJOA585-2xyL --scenario <SCENARIO> --message "<YOUR_EVIDENCE>"
The agent isn’t left to invent a response format or a priority order — the rail supplies both.
3 · The rail does the steering
Four more, from real sessions across three repos — the agent narrating, in its own words, what the rail is doing for it.
-
welcome-to-willville · PR · 2026-05-22
“I’ve got the buff rail initialized. I’m taking the next deterministic step from the tool itself now with
sm buff iterateso we move from triage into the first fix batch instead of stopping at inspection.”The agent doesn’t decide what to fix first. The rail hands it a scoped batch and it starts working — deterministic, not improvised.
-
fogofdog-frontend · PR #87 · 2026-04-22
“I’ll use
sm buff watchto monitor the PR rail rather than guessing from GitHub directly… The watch saw CI still running with 4 passed and 2 pending. I’m checking the PR rail again throughsm buff statusso I don’t leave this half-watched.”It routes CI through slop-mop instead of eyeballing GitHub — and never leaves the task to babysit a dashboard.
-
welcome-to-willville · PR #14 · 2026-05-26
“I’m resolving both PR threads through
sm buff resolvewith the new commit hash… After resolving the first two, the next buff loop pulled in six more unresolved threads. I’m reading that second report now.”Answering reviewers is part of the rail — each thread closed with commit evidence, the next round surfaced automatically. The conversation gets driven to done instead of trailing off.
-
fogofdog-frontend · PR #85 · 2026-03-24
“I’m running
sm sail, which should choose the right next PR/CI action automatically… Buff has moved on from CI to review-thread remediation: there are 13 unresolved comments grouped by category. I’m handing CI off tosm buff watch 85so the PR stays on the repo’s supported rail.”When the agent would otherwise stall on “what now?”,
sailhands it the next valid verb. Over thousands of cycles, that’s the difference between finishing the workflow and declaring victory three steps early.
4 · The loop closes green
After the fixes land and the threads are answered, the rail confirms both halves are clean — CI and review — before the agent calls it done.
$ sm buff status 16 && sm buff verify 16 🪣 sm buff status - CI Status Check 🔀 PR: #16 ✨ CI CLEAN · 3/3 checks passed ✅ integration-evidence (8s) ✅ verify (3m 18s) ✅ Cursor Bugbot (5m 22s) Buff verify clean: PR #16 has no unresolved review threads.
“Done” isn’t the agent’s opinion — it’s the rail confirming CI green and every thread resolved.
5 · When it’s wrong
slop-mop isn’t perfect, and we say so out loud. The honest part isn’t the miss — it’s what happens next.
-
fogofdog-frontend · PR #102 · 2026-05-13
“The bug:
sm buff watch 101ran silently against a stale PR instead of detecting thatfeat/perf-testingalready has an open PR (#102)… Barnacle filed at github.com/ScienceIsNeato/slop-mop/issues/197.”The work had merged and been pushed back onto the old branch, so the rail followed a stale PR for a whole session. The healthy move wasn’t to abandon the tool — it was to file a barnacle. That feedback loop is how slop-mop itself gets better over time, the same way the agent does. The friction is the feature.
Every panel here is a real session, captured verbatim by SpecStory and trimmed only for length — never reworded. Provenance (repo, PR, date) is stamped on each so you can check it.
Harm reduction, not a cure: a lower baseline of damage, earned one ridden loop at a time. That’s the whole claim.