Mob Construction: What Changes When AI Generates Code from a Real Specification

Two weeks after the Mob Elaboration session I described in my previous post, the same financial services team sat down for their first Mob Construction session. They had a validated specification: twelve user stories with acceptance criteria, a domain model with verification tiers, non-functional requirements with measurable thresholds, and a risk register that included the regulatory constraints they had surfaced during elaboration.

The AI generated the first domain model implementation in under ten minutes. A developer reviewed it and said: “This actually matches the business logic.” That reaction tells you everything about the state of AI-assisted development. “This actually matches” was a surprise, because every previous experience with AI-generated code had taught the team to expect plausible output that missed the point. The difference was not a better model. They were using the same AI they had used for months. The difference was that the AI had a specification worth executing.

A week earlier, a different team asked the same AI to build an internal alerting service for operational incidents. No structured process, no validated specification. The prompt was two paragraphs. The AI generated a complete service in forty minutes: API endpoints, database schema, alert templates, deployment configuration. It compiled, the tests passed, and the team shipped it to staging.

Three days later, they discovered the service had no concept of alert routing preferences. Engineers could not suppress alerts for systems they were not responsible for. The on-call team flagged it. The rework took two weeks, longer than building it from scratch would have taken, because the AI had generated a coherent architecture around the wrong assumptions, and every layer had to be unwound.

Map that failure to the four stages I describe below, and the gaps become obvious. No domain modeling meant the concept of alert routing preferences never surfaced as a business entity; the AI inferred a simpler model and nobody caught it. No architecture decision records meant the assumption that all engineers receive all alerts was never documented or debated; it was just what the AI generated. No structured validation meant the tests confirmed the code worked as built, not that it worked as intended. Every gap that a structured construction process would have caught became two weeks of rework because nothing in the workflow forced the team to stop and verify.

Same AI, same team skill level, different methodology, and dramatically different results.

The Quick-Cement Problem

AI generates code at a speed that creates a specific failure mode I have started calling quick-cement. The output solidifies before anyone verifies it is correct.

In traditional development, code accumulates gradually. A developer writes a function, tests it, writes the next function. There are natural pause points where assumptions get questioned. The pace of human typing creates friction that doubles as thinking time. When something feels wrong, the developer stops and reconsiders.

AI eliminates that friction. A domain model, a logical design, a full module with tests: all generated in minutes. Each artifact looks polished. Each is internally consistent with the ones before it. The team reviews the first artifact carefully, approves it, and moves on. By the third artifact, they are approving faster, not because quality improved, but because the consistency creates confidence that everything fits together.

Then someone finds a boundary error in the domain model. That error is now embedded in the logical design, the code, the tests, and the infrastructure configuration. Every layer of approval that passed without catching it made the error more expensive to fix.

This is the construction-phase version of a principle Lean software development identified decades ago: defect costs roughly double with every level they propagate through. Mary and Tom Poppendieck called it “build quality in from the start.” AI compresses the timeline so dramatically that a requirements error can propagate through design, code, tests, and deployment in a single afternoon. The doubling still applies; it just happens faster.

Mob Construction is designed around this problem. Four stages, executed in sequence, with human validation at each stage acting as what the AI-DLC methodology calls a loss function. The term is borrowed from machine learning, where a loss function measures how far the model’s output deviates from the correct answer. In Mob Construction, each checkpoint measures how far the AI’s artifact deviates from the team’s validated intent, and catches the deviation before it propagates downstream.

The Governance Layer Underneath

Before describing the four stages, there is a foundation that shapes everything the AI generates across all of them: the Enterprise Guardrails Specification, or EGS. In the financial services session, the EGS meant the AI knew about PII encryption requirements and regulatory constraints before anyone mentioned them during construction; they were already part of its context. The EGS is a document maintained as code alongside the project that defines the non-negotiable principles, constraints, and standards every session must comply with.

The EGS covers security baselines, compliance constraints, architectural guardrails, coding standards, operational readiness, cost guardrails, and AI-specific rules. Each rule has a severity level: mandatory (violation blocks the Bolt), required (exceptions need a documented ADR with risk acceptance), or recommended (best practice, deviation noted but does not block). The AI is instructed to validate every artifact it generates against the guardrails before presenting to the team, flag any conflict between a proposed design and a guardrail, and propose a compliant alternative.

This changes the dynamics of construction in a way that is hard to appreciate until you see it. In the customer notification preferences walkthrough I describe in Reimagine, Don’t Retrofit, the EGS specified PII handling requirements. During Mob Elaboration, the AI flagged encryption at rest for preference data from the start, because the guardrail was part of its context. Then during Mob Construction, that guardrail continued to shape every artifact: the AI applied field-level encryption for PII in the generated code, the domain events carried consent timestamps for audit purposes, and the infrastructure configuration included KMS encryption at rest. The team validated all of this in the same Bolt, not in a compliance review weeks later. In the traditional sprint model for the same feature, the GDPR encryption requirement surfaced during a compliance review three weeks into development, triggering rework that consumed nearly a full sprint.

Wells captures the principle well in Enabling Microservice Success (2024): guardrails should be “a stand-in for a conversation with an expert, avoiding the need for coordination with people external to the team.” In Mob Construction, the EGS is exactly that. The security team’s requirements, the compliance team’s interpretations, the architecture team’s standards: all encoded in a document that the AI consults during every stage. The security expert does not need to be in the room for every Bolt, because their expertise is embedded in the guardrails the AI enforces.

The deeper insight, one I explore at length in the book’s Chapter 10, is that the process of creating the EGS is often more valuable than the document itself. It forces the security team, the compliance team, the architecture team, and the development leads into a room together, sometimes for the first time. They discover requirements the developers have never seen, interpretations of regulations the architects disagree with, and practical constraints the security team has not considered. That shared understanding becomes the foundation for every construction session that follows. EGS turns compliance from a phase into a property of generation.

Four Stages, Four Checkpoints

The ritual moves through four stages. Each has a specific purpose, a specific output, and an exit condition that the team must satisfy before advancing. No stage is skipped, regardless of how confident the team feels.

Stage 1: Domain Modeling

Validate that the software structure represents the business correctly, before any technical decisions.

The AI takes the validated stories and domain model from Mob Elaboration and generates the implementation-level domain model: entities, value objects, aggregate roots, domain events, and the relationships between them. This is business logic modeled as software structure, independent of any infrastructure decisions.

The team’s job is to verify that the model represents the business correctly, not merely that it compiles or that it is internally consistent, but that it matches how the business actually works.

In the financial services session, the AI generated a domain model that correctly included verification tiers, the concept the team had surfaced during elaboration. But it had modeled the relationship between customer and verification tier as one-to-one. A senior developer caught it: “A customer can be in different tiers for different product types. A customer verified for basic accounts might not be verified for investment products.” The AI adjusted the model in two minutes. If that error had reached code generation, it would have required restructuring the database schema, the API contracts, and every service that touched customer verification.

When multiple teams work on different Units in the same session, this stage includes something that solo AI coding never produces: contract exchange. Team A shares its entity definitions with Team B if their Units have dependencies. The facilitator forces this exchange before any team advances to Logical Design. Conflicts get resolved here, when the cost of change is a conversation, not a refactoring sprint. As Ford and Richards argue in Software Architecture: The Hard Parts (2022), “Documenting a decision is important for an architect, but governing the proper use of the decision is a separate topic.” In Mob Construction, the governance happens in real time, with the people who understand the business in the room.

Stage 2: Logical Design

Document every architectural decision and its trade-offs before the AI generates a single line of code.

The AI translates the validated domain model into technical architecture: design patterns, service selection, infrastructure decisions. Every decision gets documented as an Architecture Decision Record before the team moves to code. The facilitator’s role here is to insist: when the AI proposes a service and the team nods, the facilitator asks “Why this over the alternative? Document the trade-off.”

ADRs are not new. Michael Nygard proposed the concept in 2011, and as Read documents in Communication Patterns (2023), they have become an effective practice for capturing not just what was decided, but why. Ford and Richards devoted an entire chapter to ADRs in Fundamentals of Software Architecture (2021), arguing that “the Decision section describes the reasons why a particular decision is made, which is by far the best form of architecture documentation.”

What is new is making ADRs mandatory before AI generates a single line of code.

When the AI proposes Lambda over ECS, the team does not just approve or reject. They document why: the workload pattern is variable, cold start latency is acceptable for this use case, the cost model favors pay-per-invocation at the projected volume. When someone six months from now asks why the service uses Lambda, the answer is not “the AI picked it” or “we just went with it.” The answer is a documented trade-off analysis that the team debated and approved.

This is where I see the sharpest contrast with unstructured AI coding. A developer working alone with an AI assistant gets architecture decisions made implicitly. The AI picks a database, a compute service, a caching strategy. The developer accepts or changes them based on instinct. No record of the alternatives considered, no documentation of the trade-offs, no trace of the reasoning. Six months later, a different developer inherits the codebase and has no idea why DynamoDB was chosen over Aurora, or whether the choice still makes sense given how the workload evolved.

A Reddit thread captured this precisely: “Are we losing the ‘why’ code exists?” (r/ExperiencedDevs, April 2026). AI-generated code does not carry the reasoning. Decisions scatter across PR comments, Slack threads, AI conversation logs. The real cost, as one commenter put it, “shows up six months later when someone needs to modify that code and has zero context.” ADRs written during Mob Construction solve this by making the reasoning an explicit, versioned artifact that lives alongside the code.

Stage 3: Code Generation

Generate executable code only after the domain model and architecture are validated and documented.

Only now does the AI generate executable code. It has a validated domain model, approved architectural decisions documented as ADRs, and non-functional requirements with measurable thresholds. The generation space is constrained. The AI implements what the team has already decided, rather than inferring intent from a two-paragraph prompt.

The team reviews the generated code for quality, security, and maintainability. The facilitator’s job in this stage is to prevent rubber-stamping. “Walk me through what this module does” is the question that separates genuine review from glancing at code that looks clean.

Two things happen in this stage that almost never happen in unstructured AI coding.

First, Infrastructure as Code gets generated alongside the application code, not as an afterthought. The AI produces CloudFormation, CDK, or Terraform configurations that match the architectural decisions from Stage 2. The infrastructure lives in the same construction session, reviewed by the same people who understand the application logic. When the team chose Lambda in Stage 2, the IaC in Stage 3 reflects that choice with the correct memory allocation, timeout settings, and IAM permissions. When they documented a latency NFR of under 200 milliseconds, the infrastructure configuration includes the monitoring and alerting that will enforce it.

Second, security review happens during generation, not after. Input validation, IAM permissions, encryption configuration: these are reviewed as the code is generated, not discovered in a security scan weeks later. The EGS makes this structural rather than aspirational: when the guardrails specify “no wildcard permissions in identity policies for production accounts,” the AI generates least-privilege IAM roles by default, and the team validates them in the same session. This is the operational reality of what Security as a Development Constraint argued conceptually: security as a development constraint, embedded in the construction process rather than bolted on as a gate.

Stage 4: Test and Validation

Verify that the code does what the business intended, not just what the implementation does.

The AI generates and executes three categories of tests: functional, security, and performance. The team validates that the tests are meaningful, not just passing.

“Meaningful” is the operative word. AI-generated tests have a specific failure mode: they test what the code does, not what the code should do. A test suite that achieves 90% coverage by testing the implementation rather than the requirements gives false confidence. The facilitator maps tests back to acceptance criteria from the original stories: “This story says customers in Tier 3 require manual approval. Where is the test for that?”

Contract conformance is a blocker in this stage. The API contract defined in Stage 2 is the source of truth. If the backend responses, frontend calls, or test mocks do not match the contract, the Bolt does not close. This catches integration errors that would otherwise surface weeks later when teams try to connect their Units.

At this point, the pattern should be clear: Mob Construction does not slow AI down. It slows commitment down to match validation. Each stage exists because the cost of catching an error there is a fraction of the cost of catching it one stage later. The four checkpoints are not bureaucracy; they are the mechanism that prevents quick-cement from setting before anyone verifies the foundation is sound. Mob Construction is a constraint system for AI-assisted construction, not a coding practice.

The Bolt as Time-Box

Each pass through the four stages is scoped to a Bolt: roughly one day of focused construction work. A Unit might require two or three Bolts to complete. Independent Bolts can run in parallel when multiple sub-teams are available.

The time-box serves two purposes. It prevents scope creep: if a story does not fit in the current Bolt, it moves to the next one rather than expanding the session. And it maintains team energy. Mob Construction is cognitively demanding. The combination of AI-generated artifacts and real-time validation requires sustained attention. After eight hours, scrutiny degrades regardless of how disciplined the team is. Better to close a Bolt with three modules thoroughly validated than to push through five modules with declining review quality.

A Reddit thread from an engineering manager captured the problem from the other side: “Features that took a sprint now take one to two days, but our release slipped three weeks because validation could not keep pace” (r/EngineeringManagers, April 2026). The team had built a productivity strategy without a quality strategy to match it. The Bolt structure is the quality strategy: it constrains the pace of generation to the pace of meaningful validation.

What Teams Discover

The stages describe the mechanics. What surprises teams is what the mechanics reveal.

ADRs surface disagreements the team did not know it had. When the AI proposes an architecture and the team has to document why they accepted it, hidden assumptions collide. “I assumed we would use a message queue here.” “I assumed we would use direct API calls.” Without the ADR requirement, both assumptions would have survived until integration testing revealed the conflict. The ADR forces the conversation when the cost of resolution is a fifteen-minute debate, not a two-week refactoring effort.

Generating IaC alongside application code changes who owns deployment. Teams expect the infrastructure to be someone else’s problem. When the developer who reviewed the Lambda function also reviews the IAM role that governs its permissions in the same Bolt, the gap between “it works locally” and “it works in production” collapses. The ownership shift is the surprise: developers stop thinking of infrastructure as a separate concern because the construction process never separated it.

Approval fatigue is measurable, and the facilitator is the instrument that measures it. I have watched teams spend twenty minutes rigorously debating the first domain model, then approve the third one in four minutes. The facilitator’s intervention, “We are approving faster. Is that because the quality improved, or because we are tired?”, has caught real errors in every session I have facilitated. The progressive rotation model means this capability transfers to the team over six to eight sessions. By session nine, team members catch their own fatigue without prompting.

The Construction Floor Changed

The developer community is converging independently on the insight that AI code generation without structure produces technical debt faster than any human team could. A senior developer on Reddit described it bluntly: “Saving challenging projects was my niche, but AI codebases are making me miserable” (r/ExperiencedDevs, April 2026). AI-generated code that lacks human architectural intent creates a new category of debt that is harder to rescue than traditional legacy code, because there is no reasoning to trace, no decisions to understand, no “why” behind the “what.”

Mob Construction addresses this by ensuring the “why” exists before the “what” gets generated. The domain model captures the business logic, the ADRs capture the architectural reasoning, the NFRs capture the quality expectations, and the tests capture the acceptance criteria. When the next developer inherits this codebase, or when the original team revisits it after three other projects, they inherit not just code, but the documented chain of decisions that produced it.

Mob Elaboration showed what happens in the requirements room: how teams transform vague intent into precise specifications. This post shows what happens when those specifications reach the construction floor: how four stages of structured generation and validation produce code that the AI could not have produced from a prompt alone. Together, they operationalize the shift from vibe coding to spec-driven development that the methodology was designed to enable.

The industry is investing billions to solve AI code quality with better models. This team solved it with better constraints. The model, the tokens, the temperature setting: all identical between the two-paragraph prompt and the validated specification. What changed was the methodology, the construction process, and the team’s relationship with the AI’s output. The rest of the industry is still waiting for the next model release. These teams shipped last week.

Ricardo