Justin Francis Self-Portrait

Saturday, May 19, 2007

The Other Source Control

Writing software without source control these days is insanity. Having worked in professional environments both with and without source control, I know the pain of not having it, and the ease with which it can be introduced. Now, I never write anything that is not under source control, even if I am working alone. It is so free and the benefits so clear that I don't even question it anymore.

I think that almost everybody (except maybe Linus Torvalds) agrees that source control is critical to software development. For all its other shortcomings, the Joel Test has source control right at the top of the list.

What has strikes me, however, is how little concern is normally given to source control of the database. Especially if one is not using an In-Memory Model, your database is critical to the correct functioning of the code. Yet I rarely hear about shops that control change to the database in the same way they control changes to the code.

I'm not talking about changes to the data housed by the database (analogous to instance changes in the running code), but changes to the structure of the database. I can remember countless times where all of a sudden, our application would stop working. It turns out someone applied a database change directly on production. Other cases include asking "what changed in the database between version 1.1 and 1.2 of the system", "what do we need to do to the database when we release this version" and "what do I need to do to run a previous version of the software"? These questions are difficult to answer accurately without all change to the database being controlled just like all other changes to the code.

At work, we have spent a substantial amount of time working to bring the database in line with solid change control for almost a year. We are finally at the point where the code is able to detect the version of the database it is running on, and is able to either upgrade the version or roll it back to a previous version depending on what version of the code is currently running. The only changes that are applied are those that have been checked-in to the code repository as being required to run on the next release of the software.

We have found that controlling change in this way provides all the same benefits source control for our code provides. And in the end, doesn't this make sense? Doesn't it defy logic that part of our code should be controlled, while part of it (db schema) should not be? Now our database acts just like our code: changes are logged historically, changes are fully tested, changes are applied once per release on production, and the only changes made are those that are fully automated.

Like code source control, I will never look back.