What is Legacy Code Archaeology?

04.10.2024

Another article about legacy code? That’s a fair question. There’s already so much written about legacy code, and yet I want to add another article—maybe even several—to the topic. Simply because I want to give back—perhaps you, as a developer, can learn something from it. And also because I want to talk specifically about PHP legacy code and its quirks, as I am a PHP backend developer myself.

My name is Marcus, and since 1999 (starting with PHP 3), I’ve been developing websites and web applications with PHP. I can certainly say that I have produced my fair share of legacy code, some of it probably quite terrible. And I hope no one had to deal with those catastrophes. My current project is migrating a large legacy web application to Symfony, and in this series of articles, I want to take you along on the adventurous journey of working with legacy code. Join me in becoming a legacy code archaeologist!

Below, I want to outline some of the topics I’ll be exploring in more detail in this series.

What is Legacy Code Anyway?

Let’s start with the logical question of what legacy code actually is. Legacy code isn’t necessarily synonymous with bad code. Over time, even the most maintainable, object-oriented, clean code with well-applied patterns almost inevitably becomes legacy code. When you leave your current company or stop maintaining your open-source project, the developers who eventually take over will be dealing with legacy code.

However, I don’t want to focus on that so much here. I’m more interested in defining legacy code as code that is, at least, less than optimal and has evolved historically. Let’s assume we’re dealing with a project that has grown from a relatively simple base system into a large monster that is barely maintainable, certainly not extendable anymore, but still needs to keep running because it’s still useful, has users, and contributes to the success of a company.

Legacy code, then, is the inheritance you must deal with, and your next task is to find a way into it.

Legacy PHP Code

PHP has changed a lot since I started with PHP 3. There are legacy systems where you can see every new feature of each PHP version—a collection of decades of evolution, perhaps driven by many developers. I’m not sure if there are other languages that have changed and, more importantly, improved as much as PHP has. But the leap PHP made from a web toolkit (PHP 3) to a mature programming language (PHP 8.*) is quite impressive.

Yet, especially in legacy code, we often find many things that make it difficult to modernize the system. Spaghetti code, global variables, lack of type safety, a "hack it together" mentality of developers, and the fact that many old systems were created by solo developers—all of this makes working with legacy PHP code challenging.

Correct me if I’m wrong, but I haven’t heard of many other large projects in other languages being developed by a single person. And always against the clock, with managers demanding ever more complex features quickly, just to keep the project competitive in the market. This inevitably leads to suboptimal code.

It was the same for me. As a lone developer, I had to build things that were meant for a team, with such tight time constraints that it was almost impossible to write clean and maintainable code. And of course, I was embarrassed to leave such a mess behind.

Ways Into Legacy Code

If you are hired to work on a legacy system, you might be lucky enough to still have the original developer(s) on the team. But more often than not, you’ll have no one to ask questions or to guide you through onboarding. It’s definitely helpful to have someone on your side who knows the code well and can explain its peculiarities. But there’s one thing no one can do for you: if you want to work on the system, you have to understand it—and find a way into the code.

There are different ways to start your expedition into the code, to begin your code archaeology:

  • Start with specific routes that interest you and work your way from route to route.
  • Use a suitable static code analyzer to find entry points.
  • Pick a file somewhat randomly and work your way through from there.

A combination of these methods can lead to success—but it greatly depends on how the legacy code is structured: Is it mostly spaghetti code? Are include and require used all over the place? Is the architecture object-oriented, or are classes just used haphazardly? Does the code already use some framework, or is it a wild mix of everything?

Soon, there will be a link here to an article that dives deeper into ways into legacy code.

Ways Out of Legacy Code

Have you found a way into the legacy code and understood it well enough? Then you need to find a suitable way out.

Some legacy systems can be entirely redeveloped in parallel. When the new system is ready, you simply switch from the old system to the new one—done. These are probably small and manageable projects that don’t quite fall into the category of “no longer maintainable or extendable.”

The truly tangled legacy projects, which are stuck and thus the most fun, usually can’t be replaced with such a big-bang approach. The parallel development would either never finish, or the big-bang switch would lead to significant issues. Such modernization projects often fail, unfortunately.

Thankfully, there are several ways to successfully migrate a legacy project to a new system, moving away from the big bang and more towards an evolutionary approach:

  • Gradual refactoring, where the code is improved and renewed piece by piece, module by module, component by component.
  • Using the Strangler Fig Pattern.

Soon, there will be a link here to an article that dives deeper into ways out of legacy code.

The Mindset of a Legacy Code Archaeologist

What makes you a good (or even suitable) code archaeologist? Working with legacy code is certainly not for every developer! You might need a certain type of code masochism to love working with legacy code. You must absolutely be curious and have great joy in solving complex problems. You also need to be resilient, as working on legacy code can be exhausting and sometimes frustrating. Patience is also a good advisor, as it can take quite a while before you understand the code well enough to work with it.

You need to have a sense of finding a healthy balance between preserving and renewing. In the migration of legacy code, it’s not always advisable to aim for a 100% solution. Some areas might not need migration or renewal and can stay as they are.

Especially when working in a team and the original developer(s) are still around, you should understand why the code ended up the way it did. Just because it looks bad to you doesn’t mean bad developers wrote it! So don’t treat anyone condescendingly—the legacy system is your common final boss, and you can only defeat it as a team!

Lastly, you should enjoy learning something new, because one thing is certain: you will learn a lot!

Would you like to be notified when a new article is published? Then write me an email at hello@marcuskober.de.