AI agents 10x productivity remains a dream
6 min read

This is the part 1 of the 2 blog series about how to use AI for 10x productivity for the modern software engineering era.
There’s a lot of buzz around large language models (LLMs) writing code, and people are dreaming of using AI to build their entire software business from scratch. But let’s be real—that idea of AI models or platforms creating everything for you is just a fantasy. It’s not going to happen the way you might hope. Here onwards, I will refer the generative AI coding tools as Agents.
AI agents constantly grapple with decision overload—juggling best practices, boilerplate code generation, software design, PRDs, security, and more. With so much to handle, they frequently make mistakes. The real issue? They don’t even realize when they’re messing up, leaving the errors for you to catch.
AI agents excel at handling specific, well-defined tasks with a narrow scope. However, when the scope expands—think real-world use cases, not just UI dashboards—they struggle, as they’re forced beyond their comfort zones and capabilities.
As a result, when an agent is building entire project from scratch, its code needs a constant refactoring causing you to spend thousands of dollars and several months building your software.
Look at this example of a vibe coded project by a indie hacker, and see how the issues spiral up when the code is shipped. Security vulnerabilities, exposed secrets, API key’s, exposed DB endpoints and breaches of real customers data. Can you really risk all that just for the sake of fast time to market?
In this blog, I walk you through some of the common problems. Why they exist. What’s the cost of having these problems if you are building a software business.
Speed of development comes with a cost of chaos in the codebase. You can save time while building your project, but you will loose out on maintainability. Speedy development comes at a steep price—chaos in the codebase.
Agents don’t understand best practices. While you can provide best practices as context to guide them, this approach has its pros and cons: it can improve output quality by aligning the agent with standards, but it’s not foolproof since agents may still misinterpret or inconsistently apply these guidelines. Relying on context also adds overhead, as you must continuously define and refine these practices, which can slow down the development process and introduce errors if not done meticulously.
Refactoring is a mess. Often the agent ends up re-writing existing working components. Often, the agent rewrites existing, fully functional components, disrupting workflows and introducing new bugs. This redundant effort not only wastes time but also risks breaking stable parts of the codebase, creating more problems than it solves.
The models on which agent run are probabilistic, and not empirical. Enabling them to predict a likely correct solution but lacking the ability to determine which option is optimal among five alternatives. eg: they don’t evaluate space complexity/time complexity before giving responses, they can’t evaluate how to output those responses that fit into project’s existing design. This leads to inconsistent results with same inputs.
Agents write more while you get less. More code is useless, resulting in a lack of standardisation, more bugs, increased complexity, slower performance, and higher maintenance costs, such as redundant loops, unnecessary variables, or overly verbose error handling.
When best practises are not followed, you end up doing hotfixes on production code.
What is the root cause of all these problems? Can’t we solve it with providing additional context?
What the model understands/doesn’t understand is a function of its vocabulary. Vocabulary is an array of tokenIds where each ID is either an english word or a coding language word(like def/const/function).
This vocabulary is primarily English, secondarily code.
When you input code, it treats it like an English paragraph. It can understand the context of these paragraphs, because it’s trained on vast datasets of similar kind—to grasp the context and meaning behind the code structure.
When things out of the vocabulary are inferred, the model’s emphasis increases on bias. In other words, here the guesswork kicks in.
If the vocabulary keeps on increasing, that doesnt mean that the model will get better. You check that the performance of 3.7 Claude is worse than 3.5 Claude. Optimizing a model for one thing can reduce its performance in other things.
When you infer things which are not part of it’s vocabulary, or the neural ‘connection’ for that token is not strong enough, then problems arise. eg: when I asked Claude to integrate Langchain/Langgraph workflow, it ended up implementing deprecated functions in my codebase, and later tried fixing it with refactoring. In my opinion, it’s hopeless to rely on an agent for integrating some latest package/library in your codebase. For ex, the agent had no concept of Langchain functions, so it fetched it from the documentation but the information it fetched was not accurate enough, because those ‘words’ were not there in its vocabulary or training data. So it added some bias/guesswork and gave the response.
Agents have an inherent problem, that they can mask guesswork with correct response. You will never know, until something breaks in production that your code is running deprecated functions. It’s like a silent assassin.
Additionally, every time an AI agent makes a wrong guess in your codebase, you bear the cost—not the LLM provider, who remains unaffected by this inefficiency. It’s your responsibility to put up a leash on top of these powerful agent, so that they know how to behave properly.
It’s important that we understand that an autonomous code building Agent is not a magic pill. You need more checks and balances to keep the beast within control. Or else, there will be chaos.
Looking forward: The real promise of AI
AI Agent is like a junior dev who can code faster, but produces dirty code more often than not. It needs constant supervision and course correction. So, the focus shifts to validation, debugging and maintainability.
In the world where writing code is now easy and fast, bad architecture decisions in the name of rapid prototype can cause sever harm to your project.
Providing best practices, standardisation, architectural guidelines and your project use case as added context doesn’t guarantee success. AI agent’s outcome would still be unpredictable.
3rd generation software frameworks are dead. These frameworks are built specific to language and tech stack . Since AI can write code in all tech stacks, they don’t add any value. They were made for human coders and not for AI agents.
Fourth-generation frameworks, being language-agnostic, are the future, focusing on clean architecture, modular design, and best practices at a low level. They aim to build scalable, production-ready systems from day one by embedding fundamental abstractions, security practices, multi-level validations, and automation-driven processes.
The above iceberg image is an excerpt taken from the part 2 of this blog series, which elaborates more on the solution – how to include all best practises, with no room for error, for building great products at scale and doing it without compromising on the speed.
The bottom line
AI isn't making our software dramatically better because software quality was (perhaps) never primarily limited by coding speed. The hard parts of software development – understanding requirements, designing maintainable systems, handling edge cases, ensuring security and performance – still require human judgment.
What AI does do is let us iterate and experiment faster, potentially leading to better solutions through more rapid exploration. But only if we maintain our engineering discipline and use AI as a tool, not a replacement for good software practices. Remember: The goal isn't to write more code faster. It's to build better software. Used wisely, AI can help us do that. But it's still up to us to know what "better" means and how to achieve it.
What's your experience been with AI-assisted development? I'd love to hear your stories and insights in the comments.