The Myth of a 19% Productivity Drop from AI

4 min read

AI Coding Assistants Developer Productivity Workflow Optimization METR Study GitHub Copilot

Where the headline came from

In early July 2025 the nonprofit Model Evaluation & Threat Research (METR) released a randomized controlled field study that let 16 experienced maintainers work on 246 real issues in the open source projects they steward. When the IDE’s AI features (Cursor + Claude 3.5/3.7) were enabled, tasks took 19% longer to finish even though the very same developers had predicted a 24% speed up before starting.

The press translated that delta into punchy headlines like “AI slows down software development by 19%.” The nuance, however, lives in the paper’s method section rather than the sound bite.

Why “no‑AI” wasn’t really measured

METR did not time a control run without AI. Instead, participants estimated how long each issue would have taken had they switched the assistant off. Seasoned engineers know those estimates are optimism dressed up as numbers; slippage is routine even for tasks in codebases you own. Linking performance to pre work guesses almost guarantees a gap between forecast and reality, with or without AI.

Contrast that with GitHub’s 2023 Copilot experiment, which used a classic lab style A/B design and found a 55% speedup on an HTTP server kata. Different measurement scaffolding, different outcome.

AI sped up typing, but slowed down everything around it

Screen recordings show developers flying through boilerplate once the model produced a reasonable draft. The lost time accumulated elsewhere:

Prompt loops involving phrasing, re-phrasing, and context-window pruning consume significant time. Model inference latency, especially under high load, adds further delays. Additionally, review and validation, reading unfamiliar suggestions, cross-checking docstrings, running tests, and reverting hallucinations, become major time sinks.

Those steps are meta work, not coding, yet they still count toward “time on task.” In the study they outweighed the raw typing gains.

The shifting bottleneck principle

Think of your delivery pipeline as a chain. AI made the coding link stronger but left the others unchanged, so the weakest link simply moved downstream. The result looks like a slowdown even though one segment accelerated.

Historically we have seen the same pattern whenever tooling leaps ahead: continuous integration surfaced that merge > test queue was the real drag; container orchestration revealed how much time ops spent on secrets and networking. AI assistants are exposing friction in spec grooming, knowledge sharing, and review throughput.

How to harvest AI gains for real projects

Move AI left by generating or validating specs before coding begins. Cache context by scripting the retrieval of relevant files and tests so prompts stay concise. Parallelize review by treating AI output like a teammate’s PR: smoke-test while the model is still thinking on the next chunk. Instrument the workflow to measure queue times, not just keystrokes, as bottlenecks often hide in invisible phases. Finally, re-estimate tasks post-AI using empirical cycle data, not memory, for planning.

Teams that iterate on process as aggressively as they tune prompts usually see the headline effect flip: the assistant becomes a net accelerator rather than an exotic text editor.

Before you call the hype police

No single study settles the debate. METR examined veteran contributors inside mature codebases - arguably a worst case for AI given deep context and strict quality bars. Rookies tackling green field code, by contrast, often report dramatic speedups.

Copilot’s own field telemetry signals that 46% of code on average is now machine‑generated for paid users, and Microsoft claims noticeable productivity lifts in internal cohorts. But telemetry is not a controlled experiment, and lab wins do not always translate to production, as METR just reminded us.

AI pair programmers are real, but so are Little’s Law and Conway’s Law. If you drop a faster coder into an unchanged pipeline, throughput will plateau while wait states mushroom elsewhere. Speed follows systems thinking.

Until our workflows evolve to exploit AI’s spike in keystroke velocity, we will keep trading coding minutes for prompting minutes, and wondering why the sprint board still closes at 5 p.m.

Need senior engineering leadership?

Engage a partner-led engineering firm that agrees on fixed fees, written scope, and accountability for outcomes instead of hours.

Talk to a partner

Access to semperMade's services is highly selective and subject to approval.