Every engineering team has that system. The one nobody wants to touch. The one where the original developer left four years ago and took half the institutional knowledge with them. The one running a language your newest hires have never seen in production.
You know the cost of keeping it alive. Stripe’s Developer Coefficient report put a number on it: developers spend 42% of their working week dealing with technical debt and bad code, totaling roughly $85 billion in lost opportunity worldwide each year. McKinsey’s research paints an even grimmer picture for enterprise teams, estimating that technical debt accounts for 20 to 40 percent of a company’s entire technology estate value.
So why do these systems keep running? Because ripping them out is terrifying. And until recently, modernizing them meant years of manual labor, blown budgets, and projects that died mid-flight.
AI is changing the math on that. Not by magically rewriting your COBOL into Python overnight (it can’t), but by compressing the most painful, time-consuming phases of modernization into something manageable. Here’s where it actually helps, where it falls short, and how to use it without wrecking your codebase in the process.
Where AI Actually Moves the Needle
The biggest bottleneck in legacy modernization isn’t writing new code. It’s understanding old code. Before anyone can refactor, re-platform, or rewrite a legacy system, someone has to map what it does, how it does it, and which pieces of business logic are buried in functions nobody’s documented since 2011.
This is where AI delivers real, measurable results.
Morgan Stanley’s internal tool, DevGen.AI, is the clearest proof. Built on OpenAI’s GPT models and launched in January 2025, the tool processed nine million lines of legacy code in its first five months. It translated outdated programming languages into plain-English specifications that any of the bank’s 15,000 developers could use to rewrite systems in modern languages. The result: an estimated 280,000 developer hours saved, according to Morgan Stanley’s global head of technology and operations.
That’s not a marginal improvement. That’s a fundamental shift in how large organizations can approach technical debt.
Beyond code comprehension, AI is proving useful across three other modernization phases:
- Dependency mapping. AI tools can crawl a legacy codebase and produce a visual map of how components interact, flagging tightly coupled modules that need careful sequencing during migration. Doing this manually on a system with millions of lines of code can take months. AI reduces it to days.
- Test generation. One of the riskiest parts of modernization is ensuring the new system behaves exactly like the old one. AI can generate unit and integration tests based on existing code behavior, creating a safety net that catches regressions early. GitHub’s 2024 research found that developers using Copilot completed coding tasks 55% faster in a study of 4,800 developers, with code approval rates rising 5% (meaning AI-assisted code passed peer review more often).
- Documentation creation. Legacy systems are notoriously under-documented. AI can generate functional documentation from source code, turning cryptic business logic into readable specs. This alone can shave weeks off the discovery phase of any modernization project.
Research supports the broader trend. Studies have found that generative AI can handle 69 to 75 percent of code edits during large-scale migrations, cutting project duration by roughly half. Fujitsu’s proof-of-concept trials reported a 20% reduction in modernization timelines with GenAI, climbing to 50% when agentic AI was added to the workflow.
Where AI Falls Flat (and Why You Still Need Humans)
Here’s the part most AI evangelists skip over: these tools have real, sometimes dangerous, limitations when applied to legacy app modernization services and complex enterprise systems.
AI doesn’t understand your business. It can parse syntax, identify patterns, and generate plausible code. But it has no concept of why your billing system calculates tax differently for customers in three specific states, or why that one API endpoint returns data in a format that makes no logical sense but keeps a critical downstream process from breaking.
Business logic is the hardest part of any modernization effort, and it’s the part AI handles worst. When Morgan Stanley built DevGen.AI, they specifically designed it to produce English-language specifications, not finished code. The tool doesn’t rewrite the system. It translates the old code into something humans can reason about, and then humans make the architectural decisions.
This distinction matters. GitHub’s own research acknowledges that while Copilot generates an average of 46% of code written by users (reaching 61% for Java developers), a November 2024 study found that 29.1% of Python code generated by AI contained potential security weaknesses. That’s a sobering number when you’re modernizing systems that handle financial transactions, patient records, or critical infrastructure.
The risks break down into a few predictable categories:
- Hallucinated logic. AI can produce code that looks correct, compiles cleanly, and passes basic tests, but implements subtly wrong business rules. In a greenfield project, you’d catch this quickly. In a modernization, where the “correct” behavior is defined by what the old system does (quirks and all), these errors can slip through for months.
- Context window limitations. Large language models can only process a finite amount of code at once. Legacy systems often have deeply intertwined dependencies spanning hundreds of files. AI tools frequently miss cross-module interactions that a senior engineer would spot.
- Security blind spots. AI tends to reproduce patterns it’s seen in training data, including insecure ones. Modernization is supposed to improve security posture, not carry old vulnerabilities into a new codebase.
- Over-confidence in output. The most dangerous AI failure mode isn’t producing bad code. It’s producing bad code that looks great. Junior developers especially may accept AI suggestions without the skepticism that comes from experience.
McKinsey, Deloitte, and Thoughtworks have all converged on the same conclusion: GenAI compresses modernization timelines and reduces manual work, but only when paired with strong guardrails and expert oversight.
A Practical Framework for AI-Assisted Modernization
So how do you actually use AI in a modernization project without introducing more risk than you’re eliminating? The answer isn’t “plug in Copilot and hope for the best.” It’s a structured approach that puts AI in supporting roles where it excels and keeps humans in charge of decisions where context matters.
Phase 1: Discovery and Assessment (AI-heavy)
This is where AI earns its keep. Use it to scan the existing codebase, generate documentation, map dependencies, and identify the modules carrying the most technical debt. McKinsey’s analysis of 220 organizations found that companies in the 80th percentile for their Tech Debt Score achieved 20% higher revenue growth than bottom performers. The first step to getting there is knowing exactly where the debt lives.
Research published in the World Journal of Advanced Engineering Technology and Sciences in 2025 found that roughly 80% of the technical debt impact in a typical codebase comes from just 20% of modules. AI can help you find that 20% fast.
Phase 2: Planning and Architecture (Human-heavy)
Once you know what you’re working with, the architectural decisions need to be made by experienced engineers. Should you go microservices, modular monolith, or something in between? Which modules get modernized first? What’s the rollback strategy if something breaks in production?
AI can generate options and surface relevant patterns, but it can’t weigh your team’s skill set, your deployment infrastructure, or your organization’s risk tolerance. This phase is where the “strangler fig” pattern, incremental replacement of legacy components, gets planned in detail.
Phase 3: Migration and Refactoring (AI-assisted, human-reviewed)
This is the execution phase, and it’s where the collaboration between AI and human developers produces the best results. AI handles the high-volume, repetitive work:
- Translating code from legacy languages into modern equivalents
- Generating boilerplate for new services and APIs
- Writing test suites based on observed legacy system behavior
- Flagging deprecated libraries and insecure dependencies
Every piece of AI-generated output gets reviewed by a developer who understands the business context. No exceptions. The Morgan Stanley team was explicit about this: DevGen.AI replaces “onerous rote work,” not engineering judgment.
Phase 4: Validation and Deployment (Human-heavy, AI-supported)
AI can run regression tests, compare outputs between old and new systems, and flag discrepancies. But the final sign-off on whether the modernized system behaves correctly in production belongs to your team. This is also where you validate that security posture has actually improved, not just been carried forward.
The Numbers That Make the Case
If you’re trying to justify a modernization initiative internally, the data is on your side. And you don’t need to stretch or cherry-pick; the numbers are stark enough on their own.
McKinsey found that 30% of CIOs believe more than 20% of their technology budget gets diverted to resolving tech debt, money that could fund growth and innovation instead. The same research showed that companies spending more than half their IT project budget on integrations and legacy fixes are likely stuck in a tech debt spiral, paying interest without reducing the principal. That’s not a technology problem anymore. It’s a business model problem.
On the productivity side, 68% of organizations report that legacy systems actively obstruct AI adoption. Think about what that means in practical terms: the companies that can’t modernize their infrastructure also can’t take advantage of the AI tools that would accelerate every other part of their business. It’s a compounding disadvantage that widens every quarter you delay.
The retention angle matters too. Stack Overflow’s developer surveys consistently show that working with outdated technology is one of the top reasons engineers leave jobs. When your best people are spending half their week wrestling legacy systems instead of building new features, they start updating their LinkedIn profiles. The cost of replacing a senior developer (recruiting, onboarding, lost productivity) typically runs 50 to 200 percent of their annual salary, depending on the role and market.
And the opportunity cost isn’t theoretical. Companies that actively manage technical debt free up engineers to spend up to 50% more time on work that directly supports business goals, according to McKinsey’s research. That’s the difference between a team that ships quarterly and a team that ships weekly.
What This Looks Like in Practice
The winning pattern across successful modernization projects in 2025 and 2026 looks remarkably consistent:
- Start with AI-powered discovery to map the full scope of your legacy estate. Don’t guess at complexity; measure it.
- Prioritize ruthlessly. Not every legacy system needs modernization right now. Focus on modules with the highest debt, the most security risk, or the biggest impact on developer velocity.
- Use AI as a multiplier, not a replacement. Let it handle code translation, test generation, and documentation. Keep humans on architecture, business logic validation, and security review.
- Adopt the strangler fig pattern. Replace components incrementally. Big-bang rewrites still fail at alarming rates because the scope is too large to manage, even with AI assistance.
- Measure before and after. Track deployment frequency, bug rates, onboarding time for new developers, and time spent on maintenance. These numbers tell you whether modernization is delivering real value or just shuffling complexity around.
The Bottom Line
AI isn’t a magic wand for legacy systems. It won’t understand why your 15-year-old ERP has that one stored procedure that makes no sense but keeps the quarterly close running. It won’t make the hard architectural calls about what to keep and what to kill. And it definitely won’t guarantee that the new system works perfectly on day one.
What it will do is eliminate thousands of hours of the most tedious, error-prone work in modernization: reading old code, writing documentation, generating tests, mapping dependencies. That’s the work that makes modernization projects drag on for years and burn through budgets.
The teams getting the best results aren’t treating AI as a replacement for engineering expertise. They’re treating it as a force multiplier that lets experienced developers focus on the decisions that actually matter. That combination, human judgment plus AI speed, is what turns a two-year modernization slog into something your team can actually finish.
