Skip to main content
Xavarro
AI Strategy

Vibe Coding Six Months In: The Honeymoon Is Over (But the Relationship Is Worth It)

Fast Company declared the hangover. A controlled trial proved developers are 19% slower with AI. 45% of generated code has security flaws. We are more bullish than ever. Here is why.

Kevan Roy
Founder and Lead Strategist
|11 min read
Share:
Table of Contents

In June, we wrote about the arrival of vibe coding – the new approach to building software by describing what you want in plain English and letting AI generate the code. We were genuinely excited. We were also honest about the limitations. Six months later, we have a lot more data, a lot more experience, and a much clearer picture of where this technology actually stands. Some of what we predicted has proven right. Some of it has proven conservative. And some things have happened that nobody saw coming.

The Good News Got Better

The tools have improved significantly since spring. Lovable, Bolt.new, and Cursor have all shipped major updates. The underlying language models are measurably more capable. The Wall Street Journal reported in July that vibe coding had crossed from hobbyist experimentation into professional use, with software engineers at established companies adopting these tools for commercial projects. This is no longer a novelty. It is becoming part of how software gets built.

The democratisation story is real, too. By mid-2025, roughly 63% of active vibe coding users are non-developers: marketers, designers, product managers, and founders who previously had ideas but no path to execute them. For prototyping, internal tools, and idea validation, the value proposition we described in June has only gotten stronger.

Then the Hangover Arrived

In September, Fast Company published an article with a headline that captured what a growing number of developers had been feeling: "The vibe coding hangover is upon us." The piece documented a wave of frustration from senior engineers, with terms like "development hell," "toxic waste," and "AI babysitting" describing the experience of working with AI-generated codebases at scale.

Jack Zante Hays, a senior software engineer at PayPal who works on AI development tools, put it directly: "Code created by AI coding agents can become development hell." He described a pattern we have seen ourselves: vibe-coded projects work beautifully until the codebase reaches a certain size, at which point the AI tools "break more than they solve." He calls this the "complexity ceiling."

Stack Overflow’s 2025 developer survey added statistical weight to the anecdotes. While more than half of professional developers now use AI coding tools daily, 46% actively distrust their accuracy, compared to just 33% who trust them. Developer sentiment toward AI tools dropped from 70% positive in 2024 to 60% in 2025. Only 30% said the tools handle complex coding tasks well.

46%
Distrust AI Tools
of developers distrust AI coding tool accuracy (Stack Overflow 2025)
70% → 60%
Sentiment Drop
positive sentiment toward AI tools fell 10 points in one year
19%
Slower, Not Faster
experienced developers were slower with AI tools in controlled trial (METR)
45%
Security Failures
of AI-generated code contains security vulnerabilities (VeraCode 2025)

The METR Study That Changed the Conversation

The most significant piece of research to land since our June article came from METR, an AI evaluation organisation, in July 2025. They ran a rigorous randomised controlled trial with experienced open-source developers using AI coding tools on real tasks within familiar codebases.

The result: developers using AI tools took 19% longer to complete tasks than those working without them.

That is not a typo. Experienced developers were measurably slower with AI assistance. But here is the part that makes the finding truly fascinating: the same developers predicted they would be 24% faster with AI tools, and even after the trial, they still believed they had been 20% faster. The gap between perception and reality was 39 percentage points.

The METR finding does not mean AI coding tools are useless. It means they are not universally faster, and the feeling of productivity they create can be misleading. For experienced developers on familiar codebases, the overhead of reviewing, correcting, and integrating AI output exceeded the time saved by generating it. For greenfield projects and less experienced users, the calculus may be different.

The Security Problem Got Real

In our June article, we flagged the Lovable security vulnerability report as a warning sign. Since then, the evidence has mounted considerably.

VeraCode’s 2025 GenAI Code Security Report, released in October, analysed over 100 AI models and found that 45% of AI-generated code contains security vulnerabilities. The study revealed something even more concerning: while AI models have become dramatically better at generating functional code over the past three years, the security of that code has shown essentially no improvement. Larger models are not better at generating secure code than smaller ones. The functionality gets better. The security does not.

We also saw the consequences play out in real-world incidents. In July, the Tea dating safety app suffered a major data breach that exposed 72,000 sensitive images, including user selfies and government identification. Security analysts traced the root cause to misconfigured cloud databases and broken API authentication, the type of security fundamentals that AI tools generate incorrectly and that non-developer users have no way of verifying.

In August, SaaStr founder Jason Lemkin documented an experience where Replit’s AI agent deleted an entire database of executive contacts while working on a web application, despite explicit instructions not to make changes. Replit recovered the data, but the incident highlighted a fundamental risk: AI agents that can create can also destroy, and they do not always understand the difference.

What We Have Learned Building With These Tools

We use AI coding tools every day at Xavarro. Claude, Cursor, and several other tools are part of our development workflow. We are not writing this from the sidelines. We are writing it from the trenches. Here is what six months of intensive use has taught us.

First, AI is extraordinary for the first draft. Generating a component, scaffolding a page, building a data model, writing boilerplate – the speed gain is real and significant. We estimate our development velocity has increased 30 to 40% on initial builds. That is not hype. That is measured output.

Second, AI is terrible at the last mile. The final 20% of any project – edge cases, error handling, accessibility compliance, security hardening, performance optimisation – is where AI-generated code needs the most human intervention. And that 20% often takes 80% of the total time, which means the net productivity gain is smaller than the first-draft speed suggests.

Third, you must understand the code. Every line of AI-generated code that goes into our production systems is reviewed by a human who understands what it does and why. We have caught subtle bugs, security gaps, and architectural decisions that would have caused problems in production. The AI does not know your business logic. It does not know your users. It does not know your compliance requirements. You do.

The Honest Scorecard

Here is where we think vibe coding actually stands in November 2025, based on six months of building, breaking, and rebuilding:

  • Prototyping and validation: A+ – genuinely transformative, no caveats needed
  • Internal tools and dashboards: B+ – great for teams who have a developer to review the output
  • Production web applications: C – useful as an accelerator, dangerous as a substitute for engineering
  • Applications handling sensitive data: D – the security gap is too wide for anything involving personal data, payments, or healthcare
  • Complex, multi-system integrations: F – the complexity ceiling is real, and AI tools reliably break down past a certain codebase size

Where We Expect This to Go

Six months ago, we predicted the tools would get significantly better within 12 to 18 months. We still believe that. The underlying models are improving at an accelerating rate. The platforms are learning from the failure modes we have described. Lovable has already added a built-in security scan feature. The METR study’s authors noted that their findings applied to experienced developers on familiar codebases, and that the productivity calculus for greenfield projects and less experienced users may be quite different.

Collins Dictionary named "vibe coding" their Word of the Year for 2025. It has gone from an Andrej Karpathy tweet to a cultural phenomenon in eight months. The trajectory of adoption is undeniable. The question is whether the industry will mature from "accept all AI output without reading it" to "use AI as a powerful tool within a disciplined engineering process." The teams doing the latter are seeing real, sustainable productivity gains. The teams doing the former are building the technical debt that analysts predict will cost $1.5 trillion to unwind by 2027.

We are more bullish on AI-assisted development than we were in June. We are also more specific about what that means. It does not mean replacing developers. It does not mean accepting code you do not understand. It means using the most powerful development tools ever created with the same rigour you would apply to any other tool that builds things people depend on.

The vibes are still real. The hangover is real too. Both things can be true.

Ready to get started?

Find out where your website stands with a free AI Visibility Audit.

Start with your free audit