Skip to content

The dark factory is not the point

Published:
4 min read

Most teams do not need a “dark factory.” They need a better way to prove that AI-generated changes are safe to ship.

That is the core argument of this piece:

If this “dark factory” conversation is new to you, start with these two posts first:

The recent “software factory” discussions making their way across the many AI and engineering blogs is useful because it makes this bottleneck visible. Dan Shapiro gives a clear maturity model for AI-assisted development, from “spicy autocomplete” to full autonomy. Simon Willison’s analysis of StrongDM’s work highlights the harder question behind the hype: not “can agents write code?” but “how do we know the output is correct?”

Use levels as diagnostics, not destiny

Shapiro’s levels are helpful when treated as a diagnostic tool. They give teams language for the shift from “AI helps me type faster” to “my job is now orchestration, validation, and judgment.”

They are less helpful when treated as a maturity ladder every team must climb. Different products, risk profiles, and compliance requirements should lead to different stopping points.

For engineering leaders, the practical use of the model is this:

  1. Identify your current operating mode honestly.
  2. Identify the current bottleneck (usually review and validation latency).
  3. Invest in the bottleneck before pushing for more autonomy.

What the factory metaphor gets right

The factory metaphor is directionally right in three ways.

  1. Specs matter more than before. Ambiguous intent gets amplified by agents.
  2. The role of engineers shifts toward system design, evaluation design, and risk management.
  3. Small teams can gain disproportionate leverage when verification is strong.

StrongDM’s published framing also reinforces a useful idea: autonomous code generation is only credible when connected to strong scenario-based validation.
Source: StrongDM Factory

Where the metaphor breaks

The metaphor becomes risky when it implies that humans are no longer needed in software delivery.

In practice, “no human review” is only defensible if review is replaced with something stronger than traditional PR review for the risk you are taking. As described by Simon Willison, StrongDM emphasizes scenario testing, holdout-style checks, and probabilistic “satisfaction” rather than only line-by-line human inspection.
Source: Simon Willison

That is not a no-human system. It is a human-repositioned system:

The operating model that scales

For most engineering organizations, the scalable model is:

  1. Generation pipeline: agents propose diffs from tasks/specs.
  2. Verification pipeline: independent checks accept or reject diffs.
  3. Governance pipeline: humans set policy, thresholds, and escalation rules.

This model is more robust than “agent writes code, reviewer glances at PR” because it makes quality gates explicit and repeatable.

If you want autonomy, you have to pay for evidence.

Evidence usually means:

What to do in the next 90 days

For engineers:

  1. Turn recurring production bug patterns into automated end-to-end test scenarios.
  2. Build at least one holdout-style eval set agents cannot see while generating.
  3. Measure first-pass acceptance and post-release rollback rates, split by change source (human vs agent-assisted).

For engineering leaders:

  1. Define autonomy tiers by system criticality.
  2. Require explicit quality gates before increasing autonomy in any tier.
  3. Measure review latency, escaped defects, and rollback rate as first-class metrics.

Where human review should remain mandatory

Even with strong automation, keep mandatory human review for:

Autonomy is not a binary switch. It is a scoped capability that should expand only when evidence quality improves.

Bottom line

The goal is not to eliminate engineers. The goal is to move engineers to the highest-leverage work: defining outcomes, designing boundaries, and deciding what to trust.

Call it a factory if you want. Operationally, it is an evidence engine.


Edit on GitHub