The Human in the Loop Is Still the Bottleneck. And That's the Point.

It's been more than a year since my team started using AI aggressively in our software development. I want to share something uncomfortable that I haven't been able to shake. Honestly, I haven't seen any real increase in our overall productivity. We are producing more half-baked proofs of concept at ten times the speed we used to. But the actual bottlenecks of building good software, the things that were slow before AI, are still slow now. Some of them are slower.

The marketing narrative around AI in engineering tends to focus on the parts the machine has obviously accelerated. Writing code. Generating tests. Drafting documentation. These are real wins, and I don't want to pretend they aren't. But the parts that haven't gotten faster, and there are a lot of them, are quietly becoming the bottlenecks that determine whether all of that machine speed actually turns into shipped value. Most of the time, it doesn't.

Context corruption is a human problem too

One of the more interesting things I've learned about LLMs over the last year is the concept of context corruption. As you keep adding more context to a model's window, the quality of its reasoning degrades. The information is technically all there, but the model can no longer hold it cleanly in mind. Important details get lost in the noise. The output becomes less reliable, even though the input got more comprehensive.

It took me a while to realize that this is exactly what happens to human reviewers too. A reviewer staring at a 15,000-line change isn't reviewing more carefully than they would on a 3,000-line change. They're reviewing less carefully. The volume exceeds their working memory. They start pattern-matching instead of reading. They approve large sections as "looks like boilerplate." They miss things that, in a smaller diff, they would have caught immediately.

→ The same total volume, very different outcomes

15,000 lines, one PR vs. five PRs of 3,000 lines

One mega PR

15,000 lines

Reviewer skims. Approves based on the description. Bugs hide in sections that get categorized as boilerplate.

Five focused PRs

5 × 3,000 lines

Each review fits in the reviewer's head. Bugs surface. The change tells a story the reviewer can actually follow.

Counterintuitively, five separate reviews are often faster in total than one massive review, because each one fits in working memory and gets done properly. Context corruption is real for humans, and AI-generated PRs are pushing right past the limit of what a human can hold.

The AI didn't invent the problem of large changes. Large changes were always hard to review. What AI changed is that the cost of producing a large change collapsed to nearly zero, while the cost of reviewing one stayed exactly what it was. The pre-AI culture had implicit conventions that kept changes small because making them small took less effort than making them large. That asymmetry has reversed. Now making them large is the cheap path, and the review process is paying the cost.

The pipeline is only as fast as its slowest stage

Software development isn't just coding. It's a sequence of stages, each with its own time cost. AI has dramatically sped up some stages and barely touched others. The result is the kind of pipeline imbalance any engineer should recognize: the overall throughput is set by the slowest stage, not the fastest one.

→ Where AI helps, and where it doesn't

Time spent at each stage of the workflow

Code gen

10× faster

Fast

Doc draft

8× faster

Fast

Design

~ same

Slow

User research

~ same or slower

Slower

Prioritization

~ same

Slow

Code review

slower per LOC

Bottleneck

The machine accelerated the parts that were already fast. The slow parts are still slow. The pipeline runs at the speed of the slowest human stage, which is exactly where the work piles up.

You can prompt an AI to give you five design alternatives in one shot. Getting actual user feedback on those five designs is still slow, and arguably even slower than before because users are now seeing more proposals than they can meaningfully evaluate. Product managers are drowning in features that need prioritization. Engineers are building things left and right, but the question of which features are actually useful versus which are eye candy with no value still requires a human coordinating with another human to understand workflows, use cases, and priorities. That coordination takes its own time, and no amount of LLM speed accelerates it.

The machine got ten times faster. The humans around it stayed the same speed. The pipeline runs at the speed of the slowest part, and the slowest part is now everywhere except the machine.

The accountability question nobody wants to answer

The fashionable solution to the human-as-bottleneck problem is to remove the human. Let the agent write the code. Let the agent review the code. Let the agent ship the code. The full end-to-end autonomous workflow, with a human "expert" only stepping in to guide direction at a high level. On paper this sounds wonderful. In practice it runs into a question nobody seems to want to answer clearly.

If you remove the human checkpoint, who takes accountability when something breaks?

The agent? The LLM provider? The platform that orchestrated the agents? I haven't seen anyone in the industry seriously volunteer to take legal, reputational, or financial responsibility for what their AI produces. The terms of service of every major LLM provider explicitly disclaim it. Accountability is being quietly held by the human developer, who is increasingly removed from the loop while still being expected to own the outcome.

This is not a sustainable position. Either humans stay in the loop with real authority and real time to do the review properly, or someone else has to start taking the bill for the failures. So far, neither has happened.

What I've actually watched happen

→ Case study from the trenches

A legacy codebase, ten times the velocity, and the regression wave that followed

I watched a team start using AI aggressively to add features to a large legacy codebase. Within a few months the velocity numbers looked spectacular. Features were shipping at roughly 10x the prior pace. Leadership was happy. Stakeholders were happy. The release notes were full of new things.

Then the regressions started. Bugs popped up in areas the changes touched and in areas they didn't touch. Subtle interactions between modules that the AI had refactored without fully understanding. Edge cases that disappeared because the tests covering them had been "modernized" away. The same team is now spending two or three sprints just fixing the regressions they introduced during the velocity spike.

We'll stabilize eventually. But the side effects have outlasted the velocity gains. Pre-release customers are furious. The quality gates that used to be enforced before merging to mainline have visibly weakened. Manual QE engineers are demoralized, drowning in regression reports they can't keep up with. And a few engineers have already started blaming Claude for all of it, which is its own kind of unhealthy.

The net velocity, when you account for the rework, is roughly what it was before AI. Possibly worse. The team is more exhausted than they were a year ago. The product is less stable than it was a year ago. And the dashboard still says we're more productive than ever.

The "fail fast" trap

One of the things AI has genuinely enabled is the "fail fast" philosophy that startup culture has been preaching for a decade. You can spin up a POC in an afternoon, throw it at users, and kill it the next morning if it doesn't land. Cycle time has collapsed. That part is real.

What the philosophy doesn't tell you is the hidden cost of failure at high frequency. Too many failures, no matter how fast they happen, slowly degrade the mental health of the people doing the failing. "Hard reset and start from scratch" sounds great in a leadership all-hands. It feels different in week six of yet another deprecated initiative, when you've poured your attention into three half-built things and watched all of them get shelved.

We are not hardware. Our memory doesn't get wiped clean when a project ends. Every failure leaves a mark. Sometimes positive, in the form of real learning. More often negative: exhaustion, loss of confidence, self-doubt, and the slow erosion of the willingness to invest fully in the next thing, because the next thing might get killed too.

→ The pattern leaders need to see

Engineers who have been through too many cycles of "ship fast, fail fast, pivot fast" don't get faster at coping with it. They get more cynical, more cautious, and more checked out. The output looks the same on the dashboard, right up until it doesn't.

What I actually think is happening

After a year of watching this play out, my honest read of the situation is a few specific things, none of which are popular to say out loud:

AI made code generation faster, not software development faster.

These are different activities. The code is a tiny fraction of the actual work. Speeding up the code is a real win, but it doesn't speed up the parts that were always the bottleneck: figuring out what to build, deciding if it's worth building, validating it with users, integrating it cleanly, and keeping the system healthy over time.

The human review bandwidth is the new constraint.

The team can produce more output than it can carefully review. So the careful review stops happening. The output ships anyway. The cost shows up as regressions, customer escalations, and a long tail of cleanup work that erases the velocity gain. The dashboards capture the gain. They don't capture the cleanup.

iii

End-to-end AI workflows have an accountability hole.

The further you remove humans from the loop, the more important the question of "who is responsible when this breaks?" becomes. Right now, nobody at the AI provider level wants to answer it. Until someone does, full autonomy is a fantasy.

The human cost of "fail fast" at high frequency is real and underpriced.

Repeated failures don't bounce off engineers cleanly the way they do off a process diagram. The compounding emotional cost shows up in attrition, in cynicism, and in the quiet decision to stop bringing your best ideas to the table because the previous ones got killed.

Where this leaves us

I don't think engineers, designers, or any creative profession can be fully replaced by AI, unless the end consumer of the product is also a machine. If you are building for human users, you will need human builders. The shape of the work might change, but it doesn't disappear. The taste, the judgment, the contextual understanding that only comes from actually living among other humans, these are not optional inputs to building software that humans will use.

Can efficiency improve? Yes, absolutely. But the model for how it improves matters. Industrialization improved manufacturing efficiency by orders of magnitude, but it didn't do it by removing humans from the loop overnight. It did it by carefully redesigning the work around the new capabilities, with safeguards, with explicit accountability, with regulated checkpoints, and with attention to the human cost when things went wrong. The factories that ignored that, that just scaled output without thinking about consequences, are the ones we now remember as cautionary tales.

→ The thesis, plainly

AI in software is going to make us much more productive, eventually. But it's going to be systematic productivity, not the haphazard kind we're chasing now. The teams that figure out where the human checkpoints should actually live, what they should actually be responsible for, and how to design the workflow around the new constraints will compound real gains. The teams treating AI as a velocity multiplier on the existing process will keep producing the regression waves we're seeing now.

The human in the loop isn't a bug to be eliminated. It's the part of the system that has the context, the accountability, and the taste. The job of leadership over the next few years isn't to remove that human. It's to figure out how to design a workflow where the human is doing the right work, at the right pace, with the right level of authority, while the machine does the parts it's genuinely better at.

We're not there yet. Most teams I see, including mine, are still in the messy middle, with all the speed of the new tools and none of the discipline to use them well. That's fine for a while. It's the natural shape of any technological transition. But the teams that stay in the messy middle too long are the ones that end up exhausted, burned out, and quietly losing ground to the teams that figured out the system.

The machine is faster than us. It will keep getting faster. The question that matters isn't how to make the human keep up. It's how to design a workflow where the human doesn't have to. Thanks for reading ✦

The human in the loop is still the bottleneck. And that's the point.