Legacy Code: Code Too Messy? Follow the Database

The Problem

Recently, I was working with an outsourced codebase. My task was to add features to an existing checkout system. The code flow was extremely messy and couldn’t be followed. I tried for many hours to run the code through the debugger and understand the existing code.

In the beginning, I tried to modify the existing code to implement the new features and bug fixes. Every single modification made would break another feature. Since the legacy system did not have unit tests, I was afraid of making modifications in fear of breaking existing functionality. I was mostly worried about breaking required functionality that I did not know existed. The origin of the problem was that the codebase relied heavily on global variables. With this in mind, I needed to find a better way.

Attempted Solutions

I realized the only solution was to re-implement the whole checkout process. I first tried looking for embedded SQL statements throughout the code. This did not lend me the data that I needed to rewrite the process; the code was just too messy. The second solution was to scan the database before and after submitting a new order. This solution helped me gain familiarity with the tables but wasn’t ideal. The third solution was to dump the database into .sql files before and after submitting a new order and then comparing them using diff. This solution failed miserably because my test database was too large; the diff tool could not handle that much data. On smaller projects, I think this could be a dandy solution. Back to the drawing board.

Actual Solution

Finally, I came to a conclusion. Web apps and SQL databases communicate over the internet via the connection string specified in config files. I just need to monitor the HTTP calls (or whatever protocol your specific database utilizes) with some kind of network sniffing tool. The networking sniffer is an excellent solution because it allows you to not only see SQL calls but also see web-service requests. This worked well but I found an even better solution. Many databases can log all executed SQL calls.

Sample MySQL Solution

In my case, I was using MySQL so I just had to log into the MySQL terminal and enter:

/** Enable logging in SQL **/
SET GLOBAL general_log = 'ON';
/** Find location of log file **/
SHOW VARIABLES LIKE "general_log%";

Now since we have the location of the log file, all we have to do is monitor it in real time. I like to use the linux tool tail for this:

tail -f '{general_log_location}'

For my application, I would submit a new order. The terminal with the tail command would display all the executed SQL commands. Now I knew which tables were being updated and what content was being inserted. With this information, rewriting the checkout process was doable.


In the end, if you need to rewrite a feature which pushes data into the database for other features, then database logging is a good option. The network sniffer has the advantage of being able to display SQL and web-service requests. The disadvantage is having to filter out all the other “noise” created other network requests.  The built-in database logging feature has the advantage of simplicity and clarity.

tl;dr If you need to add more features and bug fixes to an existing feature but the code is too messy to modify, then rewriting the original feature may be the only option. The existing feature inserts database rows that other features rely on so a complete rewrite wouldn’t work. The easiest solution is to have the database log all incoming SQL statements. If the database doesn’t have that functionality, use a network sniffer. Use this new information to rewrite the existing feature to conform to the database contract and then implement the new features.