AI Ships Code—Nobody Reviews It

Hand holding digital AI and ChatGPT graphics.

AI “dark factories” promise lightning-fast software, but they also threaten to turn critical systems into unaccountable black boxes that nobody in the building actually reviews.

Story Snapshot

Developer and Django co-creator Simon Willison says AI coding agents have crossed an “inflection point,” accelerating a shift toward fully automated software production.
The “dark factory” model removes human code writing and even human code review, replacing it with specs, tests, and tooling.
StrongDM publicly described a working “software factory” process with strict rules designed to keep humans out of the code path.
Willison warns prompt injection remains an unsolved security problem, raising risks as autonomy increases.

What “Dark Factory” Means: Software Built Without Human Review

Simon Willison used a recent podcast discussion to spotlight an emerging pattern in AI-assisted development he calls the “dark factory”—a process where coding agents run autonomously and ship changes without a human reviewing the code. The concept draws from manufacturing’s “dark factory,” where robots operate without lights because people aren’t on the floor. In software, that translates to a pipeline that turns requirements into production changes with minimal human touch.

Willison ties the model to a broader “levels” framework credited to Dan Shapiro, where higher levels mean less human involvement, similar to autonomy levels in self-driving cars. Level five, the “dark factory,” is the point where the process stops looking like traditional software engineering. That framing matters because it clarifies what is changing: teams aren’t just using AI as a tool; they are redesigning the entire production system around automation.

StrongDM’s “Software Factory” Shows This Isn’t Just Theory

One reason the story is getting attention is that Willison points to a real implementation rather than a hypothetical future. StrongDM described a “software factory” approach that explicitly forbids humans from writing code and from reviewing code, relying instead on automated checks and process rules. The team even floated an operational benchmark—spending at least $1,000 per engineer per day on tokens—as a signal that further automation and iteration is expected.

The appeal is easy to understand: if a small team can increase throughput dramatically, businesses facing tight budgets and competitive pressure will try it. But the model shifts accountability. In the classic workflow, responsibility is anchored to the developer and the reviewer who signed off. In a dark factory, responsibility shifts to the system designers, test authors, and whoever controls deployment gates—assuming those gates stay meaningful under speed and cost pressure.

The Security Problem: Prompt Injection Is Still “Unsolved”

Willison’s caution centers on security, especially prompt injection—where an attacker manipulates the instructions or context an AI agent relies on, steering it toward unintended actions. He has argued this is not a solved problem, and the stakes rise when agents gain broad permissions and are trusted to change code, infrastructure, or policies without a human reading the diff. Removing code review also removes one of the most common “sanity checks” teams rely on.

Supporters of automation argue that better testing, simulation, and policy enforcement can replace human review. That may be true in narrow cases, but the research available here does not provide long-term performance metrics, incident rates, or independent audits of dark-factory pipelines. The limited public data means readers should distinguish between “a pattern that exists” and “a pattern that is proven safe at scale,” especially for systems that affect finances, healthcare, transportation, or national security.

Who Gets Hit First: Mid-Career Engineers and Compliance-Heavy Teams

Willison’s analysis suggests mid-career engineers face the most immediate disruption because their day-to-day work often sits between junior execution and senior architecture. If AI agents can reliably convert specs into working code, the labor market could reward fewer builders and more designers, validators, and security specialists. That doesn’t mean software jobs vanish, but it does mean the job description shifts—potentially fast—toward systems thinking and process control.

For regulated industries, the tension could be sharper. Compliance frameworks often assume traceable human decision-making, documented reviews, and clearly assigned accountability. A dark-factory workflow can be designed to log everything, but logging is not the same as judgment. If the process becomes “a black box that turns specs into software,” oversight moves upstream: who wrote the spec, who approved the permissions, and who can stop the pipeline when something looks off?

What Conservatives Should Watch: Accountability, Concentrated Power, and Resilience

The politics here aren’t left-versus-right so much as citizen-versus-system. A future where critical software is built and updated by autonomous agents raises hard questions about transparency and control—especially when government contractors and large platforms adopt the same tooling. Limited government and individual liberty depend on systems that can be audited, explained, and challenged. If decision-making becomes too automated to understand, power concentrates in the hands of whoever owns the models, data, and deployment switches.

I asked @simonw what the next leap in AI software engineering is likely to be.

He explained the "dark factory" pattern where teams don't write any code or even look at their code. https://t.co/LWQPeaxml9 pic.twitter.com/2SmKC8bH4i

— Lenny Rachitsky (@lennysan) April 3, 2026

Based on the current research, the most grounded takeaway is not panic, but scrutiny. Dark factories are real in at least small-team form, and the incentives to scale them are obvious. The unresolved part is whether security and governance can keep pace once humans are removed from review loops. Until stronger evidence emerges—metrics, audits, and clear accountability structures—Americans should demand that any high-impact deployment keeps explainability and human responsibility in the chain.