A few weeks ago, a team I was advising tried something that sounded perfectly reasonable. They had an existing e-commerce platform, about five years old, built as a modular monolith with a few extracted microservices. The business wanted a recommendation engine integrated into the product catalog. The team had been using AI coding assistants for months and felt confident. They pointed the AI at the codebase, described the feature, and asked it to generate an integration plan.
What came back was technically coherent. The AI proposed new components, suggested API contracts, and even generated initial code. The problem was that it missed a fraud detection middleware that sat between the order service and the payment service, a component that wasn’t obvious from the code structure alone. It also didn’t account for a data validation step with an external address service that had been added as a workaround two years earlier and was never documented.
The team spent three days debugging integration failures before someone realized the AI had been building on an incomplete understanding of the system. The generated code wasn’t wrong in isolation. It was wrong in context.
This is the brownfield problem. And almost nobody is talking about it.
The Greenfield Illusion
Most of the conversation around AI-driven development assumes greenfield conditions. New project, clean slate, no legacy constraints. In that world, AI shines. You describe an intent, AI decomposes it into units of work, generates domain models, produces code, and the team validates. I wrote about this process in Reimagine, Don’t Retrofit, and it works.
But here’s the reality most enterprises live in: they aren’t building new systems. They’re evolving existing ones.
Various industry analyses consistently show that roughly 70 to 80 percent of enterprise IT budgets go toward maintaining and enhancing existing systems, not building new ones. The average enterprise codebase is years old, carries layers of accumulated decisions, undocumented workarounds, and tribal knowledge that lives in the heads of developers who may or may not still be on the team.
When you point AI at that kind of system without preparation, you get exactly what that e-commerce team experienced: technically plausible output built on incomplete understanding. The AI doesn’t know what it doesn’t know. And neither does the team, until something breaks.
I’ve written before about the mismatch between AI code assistants and enterprise software development and about the governance gap that causes AI initiatives to fail at scale. Both posts diagnosed the tension between AI capability and enterprise reality. This post addresses the specific, practical dimension that makes brownfield so treacherous: AI cannot safely reason about or modify a system it doesn’t understand.
Why Brownfield Breaks AI
In greenfield development, AI starts with a blank canvas. Every artifact it generates, from user stories to domain models to code, is self-consistent because there’s nothing else to be consistent with.
Brownfield is fundamentally different. The AI must work within an existing system that has:
Hidden dependencies. Components that interact in ways not visible from the code structure alone. Middleware, event buses, shared databases, and cross-cutting concerns that only surface at runtime or through institutional knowledge.
Undocumented decisions. Every mature codebase carries design decisions that made sense at the time but were never recorded. Why does the order service call the fraud detection service before payment? Because of an incident three years ago that nobody wrote down.
Tribal knowledge. The workarounds, the “don’t touch that module” warnings, the implicit contracts between services that exist only in the minds of the developers who built them. When those developers leave, the knowledge leaves with them.
Accumulated technical debt. Shortcuts, temporary fixes that became permanent, deprecated patterns still in production. AI treats all code as equally intentional. It can’t distinguish between a deliberate design choice and a hack that nobody had time to clean up.
The result is predictable. AI generates code that looks correct in isolation but breaks in integration. It proposes changes that conflict with constraints it can’t see. It creates regression risks in areas the team didn’t think to test because the dependencies weren’t visible.
This isn’t an AI limitation that will be solved by better models. It’s a context problem. And context, in brownfield systems, doesn’t live in the code. It lives in the space between the code and the people who built it.
Code Elevation: Making the Invisible Visible
The AI-DLC methodology addresses this with a concept I find both simple and essential: Code Elevation.
Code Elevation is a mandatory pre-phase for any brownfield work. Before AI participates in elaboration, design, or construction, the team first converts the existing codebase into semantic models that both AI and humans can use as shared context.
The process has three stages:
Stage 1: Static Analysis. AI analyzes the code structure, dependencies, components, interfaces, and relationships. The output is a static model: a map of what exists in the system and how the pieces connect. Think of it as the anatomy of the codebase, the skeleton and the organs.
Stage 2: Dynamic Analysis. AI analyzes execution flows for the most significant use cases. The output is a dynamic model: a representation of how components interact at runtime. This is the physiology, how the system actually behaves when it does its job.
Stage 3: Technical Debt & Developer Review. This is where the human 20 percent makes the difference. Developers who know the system validate, correct, and enrich the models. They add the fraud detection middleware the AI missed. They annotate the workaround that isn’t obvious from the code. They flag the component that “works but shouldn’t be touched.”
The entire process typically takes two to four hours for a well-scoped system. Larger or more complex codebases, particularly tightly coupled monoliths or systems with extensive undocumented integrations, may require multiple sessions. Scope determines duration, and that’s by design. The output is a validated set of models that become the context for everything that follows: Mob Elaboration, Mob Construction, testing, and operations.
Without Code Elevation, every subsequent phase operates on assumptions. With it, every phase operates on shared understanding.
Scope: Elevate What You’re Going to Touch
One of the practical strengths of Code Elevation is that it doesn’t require mapping the entire system. You elevate what’s relevant to the intent at hand.
The scoping follows a simple priority model:
High priority: Components that will be directly modified by the planned change, plus their direct dependencies. These must be elevated.
Medium priority: Components that share data or events with the modified components. These should be elevated if time allows.
Low priority: Components with no relationship to the planned change. Skip them.
For the e-commerce team I mentioned earlier, adding a recommendation engine meant elevating the ProductCatalog, OrderService, and UserService (directly impacted), plus the AnalyticsService and SearchService (shared data). The PaymentService and ShippingService were left out because they weren’t affected.
This scoping discipline matters. Elevating everything is waste. Elevating too little is risk. The team decides the right boundary, and that decision itself is a valuable alignment exercise.
What Changes After Elevation
The impact of Code Elevation cascades through every subsequent phase of the development lifecycle.
In Mob Elaboration, AI understands the existing system when elaborating new requirements. Instead of generating stories in a vacuum, it identifies impact areas and dependencies automatically. The Product Owner and developers can see, in real time, how a new feature interacts with what already exists.
In Mob Construction, AI designs integration based on validated models, not guesses. It applies minimal intrusion patterns, adapter layers, and backward-compatible interfaces because it understands the boundaries it needs to respect. As I described in The Developer’s New Role, the developer’s job in construction is validation and judgment. Code Elevation gives them the context to do that job well.
In testing, AI generates regression tests based on the dynamic models. It knows which flows to protect because it has seen how the system behaves. Without those models, regression testing is guesswork.
In operations, AI understands the architecture well enough to monitor intelligently, detecting anomalies that relate to the actual system structure rather than generic thresholds.
The common thread: Code Elevation transforms AI from an outsider guessing at the system to a participant that shares the team’s understanding of it.
Why This Matters Beyond Engineering
For executives in regulated industries, Code Elevation isn’t just a technical practice. It’s a risk control.
When AI modifies a system it doesn’t fully understand, the organization absorbs that risk silently. A regression failure in a payment flow, a broken compliance check in a healthcare application, a data integrity issue in a financial reporting pipeline: these aren’t engineering inconveniences. They’re audit findings, regulatory exposure, and financial loss. In sectors like banking and healthcare, a single undetected regression can trigger incident reports, customer impact, and remediation costs that dwarf the original development budget.
Code Elevation creates an auditable record of what the AI understood about the system before it made changes. The validated models, the scoping decisions, the developer corrections: all of these become traceable artifacts. If something goes wrong, the organization can demonstrate that AI-generated changes were made with documented context and human oversight, not on blind assumptions. In a regulatory environment where “we didn’t know” is not an acceptable answer, that traceability matters.
The Patterns I See Teams Get Wrong
After working with several teams on brownfield AI-DLC adoption, I’ve noticed recurring mistakes:
Skipping Code Elevation entirely. This is the most common and most costly error. Teams assume that because AI can read code, it understands the system. Reading and understanding are not the same thing. A developer who joins a team can read the codebase on day one. Understanding it takes weeks or months. AI faces the same gap, and Code Elevation is how you close it.
Elevating too much. Some teams try to model the entire system before touching anything. This turns a two-hour exercise into a multi-day project and delays the actual work. Scope to the intent. Elevate what you’re going to change and what it depends on. Nothing more.
Skipping the developer review. AI generates roughly 80 percent of the models correctly. That sounds impressive until you realize the missing 20 percent is exactly the part that causes integration failures: the undocumented middleware, the implicit contracts, the workarounds. Developer review isn’t optional. It’s where the critical context enters the model.
Not persisting the models. Code Elevation models are reusable. If you elevate the OrderService today for a recommendation engine feature, those models are still valid next month when someone needs to modify the order flow for a different reason. Store them. Version them. Treat them as living documentation.
Elevating static without dynamic. Structure alone isn’t enough. Knowing that Component A depends on Component B doesn’t tell you how they interact at runtime, in what sequence, with what data, under what conditions. Both models are necessary.
The Deeper Lesson
Code Elevation isn’t just a technique. It reflects a principle that applies far beyond brownfield development: AI is only as good as the context it operates in.
In greenfield, context is built progressively through intents, stories, and domain models. In brownfield, context already exists, but it’s trapped in code, in people’s heads, and in undocumented decisions. Code Elevation is the process of liberating that context and making it available to both AI and the team.
This is why I believe brownfield is actually the more important test case for AI-driven development. Greenfield is the demo. Brownfield is the reality. Any methodology that works only on clean slates isn’t ready for the enterprise.
The organizations that figure out how to bring AI into their existing systems safely, systematically, and with proper context will unlock far more value than those chasing the next greenfield prototype. Because the real competitive advantage isn’t building new things fast. It’s evolving what you already have without breaking it.
Where to Start
If your team is about to use AI on an existing codebase, ask these questions before writing a single line of generated code:
- Can your AI clearly explain your system’s architecture? If it can’t describe the major components, their responsibilities, and their dependencies, it doesn’t have enough context to modify anything safely.
- Do you know what you don’t know? Identify the areas of the codebase where documentation is missing, where the original developers are gone, where workarounds exist. Those are your highest-risk zones and your highest-priority candidates for elevation.
- Have you scoped the impact? Before generating code, map which components will be affected by the planned change, directly and indirectly. If you can’t draw that map, you’re not ready to build.
- Who carries the tribal knowledge? Get those people in the room for the developer review. Their corrections to the AI-generated models will save days of debugging later.
Code Elevation is not overhead. It’s insurance. And in brownfield systems, where the cost of getting it wrong compounds through every integration point and every deployment, that insurance pays for itself many times over.
The AI can generate code. But first, it needs to understand the system it’s generating code for.
That understanding doesn’t come from reading the code. It comes from elevating it.
Enterprises that fail to elevate their legacy systems before introducing AI are not accelerating transformation. They are automating ambiguity.
Ricardo
