๐Ÿชฃ slop-mop

Case Studies.

Real-world walkthroughs of onboarding production-grade codebases into slop-mop Maintenance Mode. Running sm refit to identify, categorize, and remediate legacy shortcuts, typing gaps, and configuration drift.

Why ยท refit, then maintenance

slop-mop runs in two modes that do very different jobs. Knowing which one a case study is about is half of reading it honestly.

Refit ยท one-time Onboarding. The single goal is a seaworthy ship: a repo with no known issues and a committed baseline. You run it once, and this case study is about that run.
Maintenance ยท every watch sm swabsm scoursm buff. The steady state, and where the lion’s share of the real code-quality work gets exercised. The buff cycle โ€” turning CI results and review feedback into the next fix โ€” is the throughput multiplier that makes the whole thing pay off.

Why bother with refit at all? Because you cannot swab and scour mechanically every watch and trust the result unless you start from a known baseline. Refit draws that line. After it, everything maintenance flags is new slop โ€” from the change in front of you โ€” not years of accumulated legacy noise. That line is what makes the daily loop fast and worth trusting.

The ideal refit is as frictionless as possible. Fixing obvious slop is not friction โ€” auto-fix handles that for free. The friction worth naming is slop-mop’s own: this is a young tool with limited support, and onboarding a 322k-line codebase surfaced real rough edges in the tool itself. We log those as barnacles rather than hide them.

And “refit didn’t fix much” cuts both ways. It can mean the repo already had strong practices โ€” or that slop-mop disabled gates and lowered thresholds just to go green, which is exactly what we don’t want. This onboarding was a mix, leaning toward baselining (see the honest breakdown below). The upshot is still a good one: OpenHands is now mostly onboarded. Ideally the coverage threshold climbs from 49% over time โ€” but the team can already run the maintenance loop without fighting refit churn.

Case Study #1 ยท Onboarding OpenHands

OpenHands is a massive developer assistant platform. Onboarding its primary repository (300k+ lines of Python) to slop-mop required running sm refit to review all quality gates, establish baseline configurations, and execute step-by-step code remediation.

Repository OpenHands/OpenHands
Codebase Size ~322,000 lines
Language Python, TypeScript
Initial State Unvalidated (Onboarding)
Final State scour_clean (Maintenance Mode Active)
Remediation PR PR #14719
Barnacles filed (against ourselves) #263, #264

Onboarding Findings & Remediation

Running sm refit --start analyzed the project across all active checks, identifying findings in 7 different quality gates. The remediation path involved target refactoring alongside precise scoping configuration to lock down the codebase.

What was actually fixed โ€” versus baselined

The findings above use the word “remediation” generously. Here is the honest split of a 400+-file PR whose bulk was reformatting: a small amount of genuine fixing, and a lot of “accept the existing state and guard against regressions from here.” Onboarding a 322k-line legacy codebase is mostly the latter โ€” refit earns a clean starting line, not a rewrite of history โ€” and we won’t dress baselining up as a deep clean.

Genuinely fixed GitHub Actions least-privilege (contents: read); a Docker sandbox test corrected (port-map keys int → str, and a hardcoded /tmp path swapped for tempfile.gettempdir()); and duplicate pagination Query() titles extracted to shared constants across 11 routers (see the second pass below). A small, real set of changes โ€” not a 322k-line clean.
Suppressed 61 # type: ignore / # noqa markers added โ€” the type and lint gates were silenced, not satisfied.
Baselined (Python) Coverage frozen at the existing 49%; strict_typing: false and strict: false downgrade the strict-typing and type-blindness gates.
Scoped out The TypeScript front-end gate suite (formatting, dead-code, type-checking, bogus-tests) and three meta-gates โ€” including gate-dodging and silenced-gates, the gates that flag suppression โ€” are disabled in the committed config. The green board below is the Python core, not the whole repo.

Read the green board that follows with that in mind: it is real, but it is the score for a deliberately scoped, baselined Python surface โ€” not a claim that 322k lines got cleaned.

OpenHands โ€” sm status
$ sm status

๐Ÿชฃ sm status โ€” Project Status Check
๐Ÿ”€ Project: OpenHands
๐Ÿ”ง State: scour_clean

โœจ MAINTENANCE MODE ACTIVE ยท 20/20 checks passed

   โœ… myopia:dependency-risk.py  (passed)
   โœ… myopia:github-actions-hygiene  (passed)
   โœ… myopia:ambiguity-mines.py  (passed)
   โœ… deceptiveness:bogus-tests.py  (passed)
   โœ… laziness:complexity-creep.py  (passed)
   โœ… laziness:dead-code.py  (passed)
   โœ… laziness:debugger-artifacts  (passed)
   โœ… overconfidence:coverage-gaps.py  (passed)
   โœ… overconfidence:missing-annotations.py  (passed)
   โœ… overconfidence:type-blindness.py  (passed)
   โœ… deceptiveness:gate-dodging  (passed)
   โœ… laziness:silenced-gates  (passed)

Executing sm status on OpenHands reports a clean slop-mop board โ€” slop-mop's own gate suite, now with the gate-dodging and silenced-gates meta-gates re-enabled in the second pass. That is a separate scoreboard from OpenHands' native CI; getting that green took the reconciliation in the ledger below โ€” and as of this writing it is, with the PR passing every check and blocked only on a maintainer's review.

A second pass ยท did we leave real slop on the table?

A fair question, so we went back and looked harder for genuine slop hiding behind the baseline. The honest result: very little turned up โ€” and that scarcity is itself the signal.

That answers the earlier ambiguity directly: this baseline reflects good existing practice, not slop swept under a rug. OpenHands runs its own eslint, tsc, ruff, and mypy โ€” all green โ€” so most of what slop-mop “downgraded” was either redundant with the host's CI or simply stricter than it. The honest next step belongs to the OpenHands team, not us: raise the 49% coverage floor over time. Onboarding earned the clean starting line; the deep cleaning, where any remains, happens in maintenance.

The honest ledger ยท where slop-mop got it wrong

We are not selling a green checkmark. The board above is slop-mop's own gates. Getting OpenHands' native CI to pass โ€” its ruff lint and its migration checks โ€” took real work, and slop-mop made genuine mistakes getting there. We hold ourselves to the same standard we ask of our users: when the tool creates friction, file a barnacle. Here are the ones this case study produced, against our own repo.

Lessons learned: Onboarding a large codebase is not about rewrite-everything. It is about identifying structural risks (like bogus tests), locking in a baseline (like coverage thresholds), and scoping rules so future PRs are validated without legacy noise.

It is also about being honest when the tool gets it wrong. Two of the findings above were slop-mop's own mistakes, and both became barnacles (#263, #264) โ€” the same friction-reporting loop we ask every user to run. We would rather show you the warts and the fixes than a flawless screenshot.