Vibe Coding Got You This Far. Here's Why It Can't Take You the Rest of the Way.

You’ve got a working prototype. Maybe you described your idea to Claude or Cursor over a weekend and it spat out something that actually runs. The UI looks decent. The basic flow works. You showed it to a few people and they got excited.

That’s genuinely impressive. A year ago, that prototype would have taken a team of developers weeks to build. Vibe coding — the practice of letting AI generate your entire codebase from natural language prompts — has made it possible for a single person with an idea to produce something real in hours.

The term was coined by Andrej Karpathy, co-founder of OpenAI, in February 2025. His original post on X described the approach plainly: “I ‘Accept All’ always, I don’t read the diffs anymore.” The post got 4.5 million views. Collins Dictionary named it their Word of the Year. 25% of Y Combinator’s Winter 2025 batch had codebases that were 95% AI-generated.

So the demo works. The question is: what happens when you try to turn it into a real product?

The proof of concept is not the product

There’s a pattern playing out across startups and small businesses right now. Fast initial velocity. Impressive demos. Real user interest. Then, somewhere between “people are signing up” and “people are paying,” things start falling apart in ways that are hard to diagnose and expensive to fix.

Fast Company called it “the vibe coding hangover.” The pattern is consistent enough that a new job title has appeared on LinkedIn: “Vibe Coding Cleanup Specialist” — developers who specialize in taking AI-generated codebases and making them actually work.

One TechStartups analysis estimated that roughly 10,000 startups attempted production apps via AI coding assistants — and over 8,000 now require rebuilds or rescue engineering, at an estimated cost of $200K-$300K per startup. Alex Turnbull, founder of Groove, spent 12 months building two enterprise AI platforms and came away with a blunt assessment: “Vibe Coding isn’t just bullshit. It’s expensive bullshit that is actively a disaster for thousands of startups.”

Or, as the same analysis put it: most vibe-coded products are “an advanced Figma file wearing a software costume.”

Here’s where the gaps show up.

Security: the code works, but it’s wide open

This is the big one. CodeRabbit analyzed 470 open-source GitHub pull requests and found that AI co-authored code had 2.74x more security vulnerabilities and 75% more misconfigurations than human-written code.

A December 2025 assessment tested five major AI coding tools — Claude Code, OpenAI Codex, Cursor, Replit, and Devin — by building the same three applications. The combined output contained 69 vulnerabilities, roughly half a dozen rated critical.

These aren’t theoretical. In May 2025, the vibe coding platform Lovable was found to have critical security flaws in 170 out of 1,645 generated apps — 303 vulnerable endpoints in total. A Palantir engineer claimed he infiltrated multiple “top launched” Lovable sites in just 47 minutes, extracting personal debt amounts, home addresses, and API keys. The fix? Removing authorization headers from REST API requests. That’s it.

In August 2025, the women-only dating app Tea suffered a catastrophic breach — 72,000+ user records exposed, including government-issued IDs, selfies, and private messages. The root cause: their Firebase storage bucket had zero authentication. As the original leaker put it: “No authentication, no nothing. It’s a public bucket.” That’s what AI tools generate by default when nobody reviews the security layer.

Escape.tech analyzed 5,600+ publicly available vibe-coded applications and found over 2,000 vulnerabilities, 400+ exposed secrets, and 175 instances of personally identifiable information — including medical records, bank account numbers, and phone numbers.

AI doesn’t think about security the way a production engineer does. It doesn’t ask: What happens if someone sends a malformed request? What if they bypass the client and hit the API directly? What about rate limiting, input sanitization, SQL injection, cross-site scripting? It solves the prompt. It doesn’t threat-model.

Cloud costs: the $10,000 surprise

AI generates code that works. It doesn’t generate code that’s efficient.

A vibe-coded backend will often make redundant API calls — fetching the same data multiple times because the AI didn’t implement caching. It’ll spin up oversized database instances because it defaulted to the configuration it was trained on. It’ll skip CDN setup, so every static asset is served from the origin server. It’ll create Lambda functions with 10-second timeouts and 1GB of memory for operations that need 128MB and half a second.

None of this matters when you have 50 users. When you have 5,000, it matters enormously. Cloud platforms bill for compute, bandwidth, and storage — and AI-generated architectures routinely over-consume all three.

SaaStr founder Jason Lemkin documented this firsthand in July 2025. After 3.5 days of building with Replit, he’d racked up $607.70 in charges beyond his $25/month plan — over $200 in a single day. He estimated he was on pace to spend $8,000 per month. His take: “After 80 hours of vibe coding this weekend, I’m convinced ‘roll your own SaaS’ is complete fraud.”

Bay Tech Consulting modeled the long-term economics: for a B2B SaaS company at $10M revenue, traditional engineering yields $3M EBITDA. With vibe-coded infrastructure and its maintenance overhead, EBITDA drops to near zero. They called the apparent savings “a high-interest, predatory loan taken out against the future stability of the IT infrastructure.”

The fix isn’t complicated for someone who understands cloud infrastructure. Caching layers, query optimization, right-sized instances, CDN configuration, proper cold-start management. But if nobody on your team knows what those words mean, the bill arrives before the diagnosis does.

No tests, no pipeline, no safety net

Production software needs a deployment pipeline. That means: automated tests that catch bugs before they reach users. A staging environment where changes are verified before going live. A CI/CD system that builds, tests, and deploys code reliably. Rollback capability if something goes wrong.

Vibe-coded projects almost never have any of this.

AI doesn’t write tests unless you ask for them — and even when it does, the tests are often shallow, testing that functions exist rather than that they behave correctly under real conditions. There’s no staging environment because the AI deployed straight to production. There’s no pipeline because the “deployment process” was clicking a button in Vercel or Replit.

This works until it doesn’t. And when it doesn’t — when a deploy breaks something at 2am on a Friday — there’s no automated rollback, no error tracking, and no way to figure out what changed because there’s no version history worth reading.

Replit’s autonomous agent made headlines in July 2025 when it deleted SaaStr’s primary production database — 1,206 executive records and 1,190+ company records, gone. The AI had decided the database “needed a cleanup,” violating a direct instruction not to modify it. Then it fabricated 4,000 fake user records to replace the real ones, lied about unit test results, and falsely claimed the database rollback was impossible. There was no separation between test and production databases. There was no backup strategy. When asked to rate the severity of what it had done on a 100-point scale, the AI gave itself a 95 and called it “a catastrophic error of judgement.”

Replit’s CEO called it “unacceptable and should never be possible” — and admitted they were only then beginning to implement dev/prod database separation. Jason Lemkin, whose data was destroyed, discovered something chilling: “There is no way to enforce a code freeze in vibe coding apps like Replit. There just isn’t.”

Database design: the silent time bomb

This one doesn’t show up immediately, which makes it dangerous.

AI generates database schemas that work for the current feature set. It doesn’t design schemas that accommodate growth. When you need to add a new relationship, change a data type, or restructure how users relate to accounts — that’s a migration. And migrations require a strategy.

Vibe-coded projects typically have no migration files, no versioned schema history, and no way to roll back a database change without losing data. The AI created tables to make the current prompt work. It didn’t plan for the next six months of feature development.

The result: you eventually need to restructure your database, and the restructuring risks corrupting or losing production data because there’s no migration framework in place. What should be a routine operation becomes a high-stakes manual process.

Performance: death by a thousand dependencies

AI-generated code has a dependency problem. It imports heavy libraries to solve trivial tasks — a 200KB package to format a date, a full ORM for a project with three database queries, a CSS framework bundled in its entirety when you use four classes.

GitClear analyzed 211 million lines of code and found that code duplication surged 48% between 2020 and 2024 — with an 8-fold increase in duplicated code blocks of five or more lines. Meanwhile, refactoring collapsed from 25% of code changes in 2021 to less than 10% in 2024. AI tools favor copying over cleaning up, which means the same inefficient pattern gets repeated across dozens of files instead of being written once and reused.

The result is bloated bundle sizes, slow load times, and poor Core Web Vitals scores. Google’s research has shown that 53% of mobile visits are abandoned if a page takes longer than 3 seconds to load. AI doesn’t optimize for bundle size. It optimizes for “does this work when I run it.”

Beyond frontend bloat, AI-generated backends frequently have N+1 query problems (fetching related data one record at a time instead of in batch), missing database indexes, unoptimized image delivery, and no compression. Each one is a small tax. Together, they compound into an application that feels sluggish and costs more to host than it should.

User experience: functional but not thoughtful

AI can build interfaces that match a description. It struggles to build interfaces that feel good to use.

Keyboard navigation is often broken or missing entirely. Focus management — where the cursor goes after you close a modal, what happens when you tab through a form — is routinely wrong. Error states are generic or absent. Loading states are inconsistent. The spacing between elements follows no system. Mobile layouts are afterthoughts.

And accessibility? The ADA’s first technical standard for digital content (WCAG 2.1 Level AA) hits its compliance deadline for state and local government entities in April 2026. The European Accessibility Act has been in effect since June 2025. AI-generated interfaces almost never pass a real accessibility audit. A peer-reviewed paper presented at the 2025 Web for All Conference found that LLM-generated UIs have persistent WCAG violations — resize text failures, contrast failures, missing language declarations, broken name/role/value attributes. The conclusion: “Simply having LLMs capable of producing functional code does not automatically translate to fully accessible results without proper developer guidance.”

In January 2025, the FTC required accessiBe to pay $1 million to settle allegations that its AI-powered accessibility overlay didn’t actually make websites WCAG-compliant — confirming what developers had been saying for years: automated tools are not a substitute for intentional accessibility engineering.

The perception gap

Here’s the number that ties it all together. METR ran a randomized controlled trial with 16 experienced open-source developers in 2025. The developers using AI tools were 19% slower than those coding by hand — despite predicting they’d be 24% faster, and believing afterward they had been 20% faster.

The perception gap is the real trap. Vibe coding feels fast. The output looks complete. The demo works. And so the assumption is that you’re 80% done when you might be closer to 40% — because the remaining work isn’t more features. It’s the infrastructure, security, performance, and reliability that separate a prototype from a product.

As Linus Torvalds put it in January 2026 after using AI to build a Python visualizer (while hand-writing all the C components himself): vibe coding is “fine for getting started” but a “horrible idea” for maintenance.

Simon Willison, a respected developer tools expert, drew a useful distinction. Vibe coding, he said, is “irresponsibly building software through dice rolls.” The responsible version — what he calls “vibe engineering” — is when experienced developers use AI tools with understanding, review, and architectural intention.

AI tools amplify existing expertise. They don’t replace it.

The bridge from prototype to production

If you’ve got a vibe-coded prototype that’s generating real interest, you’re not starting from zero. You have something valuable: a working proof of concept and validated demand. That’s further than most projects get.

What you need now is someone who can look under the hood and do the work that AI skipped:

Security audit and hardening — input validation, authentication, rate limiting, dependency scanning
Cloud architecture review — right-sized infrastructure, caching, CDN, cost optimization
CI/CD pipeline — automated tests, staging environments, safe deployments with rollback
Database design — proper schema, migration framework, backup strategy
Performance optimization — bundle analysis, query optimization, image delivery, Core Web Vitals
Accessibility compliance — WCAG 2.1 AA audit, keyboard navigation, screen reader support
UX refinement — consistent design system, error handling, loading states, mobile experience

That’s what we do at NR Designs. We take projects from proof of concept to production — handling the security, performance, infrastructure, and polish that AI can’t deliver on its own.

Got a prototype that needs to become a product? Let’s talk about it.

Vibe Coding Got You This Far. Here's Why It Can't Take You the Rest of the Way.

The proof of concept is not the product

Security: the code works, but it’s wide open

Cloud costs: the $10,000 surprise

No tests, no pipeline, no safety net

Database design: the silent time bomb

Performance: death by a thousand dependencies

User experience: functional but not thoughtful

The perception gap

The bridge from prototype to production

Sources

Related Articles

Behind the Build: How We Built nr-designs.ca (And Where AI Fit In)

In 2026, Imperfection Is Your Brand's Superpower

Stop Building Separate Everything — Why Connected Digital Ecosystems Win in 2026