Justin Francis Self-Portrait

Monday, October 29, 2007

QA is Expensive

Is it "obious day at camp stupid?" Maybe, but quality assurance is still expensive, and people (especially stakeholders) sometimes like to forget this fact. In this context I am using QA to refer to the final testing of a build as a whole. Our team does not have dedicated QA staff, so every two week iteration the entire team takes from one to two days to test the build. That is 10%-20% of the total effort of an iteration. Read that line again.

Stakeholders, however, are still very angry (understandably) when a bug makes it into the production system. On any given iteration, we usually have a patch after the fact that fixes something, though usually something minor. I bring it up because that is our strategy for making up for the deficiency in the QA effort: let the users test it.

It sounds horrible, but the best testers of any system are real users trying to use the system in the real world. They find bugs ridiculously fast. This might lead you to have the idea of users testing a preview of the release. This is a good idea, but does not work for business applications because there is a usually only a single instance of the production software running at a time.

Unfortunately, there really is no other alternative except to spend more money to test a build. Upper management is not going to fork over the money unless there really is a need to be 99% bug-free on delivery day. This is usually not the case unless you are shrink-wrapping. And let's face it, you're not.

If that is not enough to dissuade you, in addition to extra money, if you are looking at a dedicated QA staff, you will also have extra lag time between the finishing of a build and its delivery (you cannot test a build before it is finished, at least, not the QA I am talking about here). The QA staff must be handed the build, and the developer team must be handed back a list of bugs, at which point the process repeats. In the meantime, the team has moved on to a new build, and is no longer focused on the old one. So deliveries end up being delivered half-way through an iteration instead of on iteration boundaries.

I have found that if you patch any bugs the users do find (that are important, see my last post) in a reasonable time with a reasonable attitude ("thanks for reporting that", "must have slipped past our two days of testing"), the users will not mind. Instead they will worship the ground you walk on for reducing QA time and given them more effort to spend on new development. Pause not.

Friday, October 12, 2007

Fixing Bugs is not Free

When wandering the halls, I will often hear comments from users about little bugs (usually display bugs) and I tell them straight up that in all likelihood, the bug will never be fixed. The typical response to this is a gasp, followed by a smug look that means something along the lines of "I could write software better than these amateurs."

I have also told developers who report small bugs that "we'll wait for a user to report that," with similar results. I then have a choice to make. Try to convince them I actually know what I am doing, or leave them thinking I'm an buffoon. Here is the argument I make.

Fixing bugs is just like building new features. It is not free. Each bug users desire fixed costs effort (points in our agile methodology) to do so. Bugs are usually much cheaper to fix than new features are to build, but the cost is certainly not insignificant.

If bugs cost effort to fix just like anything else, then they must be estimated and scheduled just like everything else. This is where the key lies. When confronted with the option of refining an existing feature (let alone a bugfix) or the creation of a new feature, stakeholders will almost always opt to implement a new feature (this leads to a kind of functional but rough monolith, but that is another post). This means that bugs, especially ones that don't really hurt anybody, are the least likely items to get scheduled. And so they don't.

I should make a note about critical bugs. If a critical bug (one that has no workaround that prevents an important feature from working) is found, we fix it immediately (forget iterations), but even these are not free. After the fact, we estimate the fix and then push an appropriate number of items from the current iteration to make room for the bugfix, just as if a stakeholder has scheduled it.

Surprisingly, systems I have built using this strategy are not buggy as one would expect, though that probably has more to do with Test Driven Design than anything else. The point is that if you do things properly, this strategy not only works but works well. We hardly ever schedule bug-fixes at work, and when we do, they are usually almost as large as features.

Once this is explained, I wait for a few weeks and then circle back. The person in question is usually impressed with the features we have delivered in that time and is not concerned about that bug that they don't even notice anymore.

Sunday, September 23, 2007

Multi-Project Agile

We have just inherited a new project with existing codebase and developers. The team has grown and the size of the codebase we need to maintain has grown. We are, however, facing some questions about how to plan for both our primary project and this secondary project, which is being end-of-lifed. We are not talking about two major components of the same application; the two projects have different architectures, languages, stakeholders and developer expertise.

A few little problems before we begin. We decided to have the iterations for both projects operate on the same schedule to reduce problems with planning and tracking velocity (described below). We also separated the code repositories and project planning software instances.

The big problem is how do we apply agile planning to both of these projects with a single team? From what I can see there are two major ways to tackle the problem. The first is to divide the team in some way (by day, by developer, etc). The second is to ignore that the second project is different, and plan the team's time as a single entity.

There are a number of ways to divide the team. We could devote all of a developer's time. We could also rotate developers days or a week at a time. The second would be preferable because then all developers are exposed to both projects. This is essentially creating two teams, whether actual groups of developers, or simulated groups of developers (by dividing based on time). We would then have two iteration planning meetings; one for each block of time. The problem with this approach is that the stakeholders cannot control the amount of effort spent on each project. Because there are conceptually two teams with two velocities, they must plan them separately and make concessions separately.

Intuitively, however, I think the second option holds more promise. In this scenario, the extra project works just like a separate component of the existing project. The team has a certain velocity, which the stakeholders can apply to both projects as they feel is appropriate. This means a single iteration planning meeting with a single velocity with the union of all stakeholders to plan the team's iteration. The major problem with this is that it is dirty for developers and planners. Developers have more severe context-switching and planners will probably need tools that can do multi-project planning to track velocity at the team level instead of project level.

In the end, we have opted for option 2 because of the flexibility it adds. It will be rough for us internally until we can get things running smoothly. It is a crummy situation all around to have to maintain an assumed system. Planners and developers will hurt, but it is an experience everyone should go through (just like working on that crummy system) to drive home the principles put forth in agile methodologies and simply good programming practices.

Saturday, September 8, 2007

Is Database Version Control Worth It?

In this second post on the subject of automatic database upgrading, I discuss the merits of the system whose implementation is described in the previous post.

I won't be circumspect; it is not cheap to make auto-deployment work. It took one programmer two weeks to implement and we probably spend about one day a month maintaining the system. In the end, it comes down to one fundamental factor: the number of environments to which you deploy. But first, the pro and cons.

The biggest advantage of this system is that deployments can be fully automatic. They can be scheduled to happen at a specific time, and everything will get upgraded. No waiting on the DBA, or ensuring the changes are all correct and in the proper spot.

Similarly, the deployment of the new "database version" becomes as solid and consistent as the deployment of the code. The deployment of the database necessarily becomes part of the testing phase of an iteration. This means that deployments are more often successful because they are better controlled and better tested.

The one big disadvantage is complexity. There is a lot of complexity in maintaining a database version. I am not convinced, however, that this complexity is due to the automatic deployment. Rather, I think that the deployment merely exposes problems that would otherwise be hidden when deployment is done manually.

For example, the system may boot fine without running a particular rollback script, but the deployer will stop the boot because the upgrade will fail because the rollback was not run. This would be hidden in a manual deployment, but exposed during an automatic one.

But by far the biggest determining factor is the number of instances you need to deploy to. Just on the development side of things, we have a unittesting database, a deployment database (to unit-test the upgrade scripts), the full development database, the trimmed (lean data) development database and a number of personal development database instances. Then there are testing, staging and production databases.

If a developer makes a change to the database they must publish this change (run it) on each of those databases. If they don't, tests will begin to fail and servers will fail to start as others upgrade to the latest source which does not yet have the required database support. If the developer does not run them on all the databases, it is left to other developers to run the changes when they figure out the reason their servers will not boot.

With the automatic deployment, none of this is necessary. Upgrading to the latest version of the source will also upgrade any databases being used to the appropriate version.

For us, with only a half-dozen databases, it is worth it. It is worth it twice over. I never have to come in during the pre-dawn to launch a version, and I never have to tell other guys they need to run change scripts as they appear. My code changes and databases changes have been harmonized. They both work in the same way.

Everything just works. And that is the best compliment a user can give to a piece of software.

Friday, August 17, 2007

Database Version Control

This first post will provide an overview of the automatic upgrading and downgrading tool for the database we have built into our application at work. The next will reflect on the advantages and disadvantages of this approach.

I have already posted on the importance of versioning the database. This post describes the next step we took in the same vein which was the automatic running of the versioned SQL change scripts to upgrade the database from one version to the next.

This was not as easy as I would have thought.

Upgrading to a new version of software itself is pretty trivial; all you do is replace the static environment (the code) and reboot. The trickiness is upgrading the dynamic environment (the data). Previously, all of this was done manually by a Database Administrator during the launch.

The implementation of the auto-deployment provides three major functions: upgrade(), rollback() and redeploy(), one of which can be run on system startup to ensure the database and code version match.

Upgrade will take the latest change scripts from source control, run them, and then insert a new row into a version table in the database, along with the script it ran, as well as the rollback scripts that undo the upgrade.

The rollback scripts must be stored at upgrade time because when it comes to rollback, we need to rollback not what is currently under source control, but what was run when the last upgrade happened. In addition, if you only store the rollback scripts for the current version, when you rollback (see the deployer, below) you are running a previous version of the software, which does not have access to the future version's rollbacks under source control.

Rollback will simply run the rollback scripts stored in the version table for the latest version, then delete the row from the table.

Redeploy will run the rollback, then run the upgrade and re-insert the version row into the version table. This is extremely useful during development when new database changes are coming in over the course of an iteration, and a single upgrade is insufficient.

Which of the three functions gets run on server startup is up to the deployer(). The deployer checks the database version (stored in a table in the database) versus the code version (stored in __version__ for python).

If the code version is not in the version table, an upgrade must be done. If the code version is in the version table, but the current code version differs from the database version, a rollback must be run.

Finally, if the versions match, but the upgrade script stored in the database does not match the ones under source control, a redeploy is performed.

Each of these operations is wrapped in a transaction. The first problem we ran into was how to handle bad change scripts. In these cases, the upgrade would fail half-way through, and the database would not be upgraded, but it would not be in its original state either. We immediately wrapped all three of the operations in transactions to ensure this horribly messy, horribly frequent problem did not re-occur.

One of the major unsolved problems we still have is the problem of bad rollbacks. If a bad rollback is committed, but with a valid upgrade, the rollback script is inserted with the upgrade into the database version table successfully. Then when the rollback is actually tried later, it fails, and there is no way to fix it because the rollback is already in the database. Our workaround is to simply replace the rollback script directly in the database with the latest from source control.

The next post will come to a conclusion about whether all of this is worth it, and how much it really does cost.

Sunday, August 5, 2007

Unbelievers

Introducing process into a company is always a slow and difficult process. I have been gradually introducing agile processes into my current company over the last two years. This week, however, I realised that the work will never be complete. In a sentence, there will always be unbelievers. There will always be those people who just don't like process; who simply cannot work with structure. They are usually the same people who do not prepare for meetings, who don't read long emails, and who like the idea that when they say things, others jot those things down and get them done right away. The good news is there are ways to handle these people.

First, convert as many people into believers as possible (whether from agnostics or otherwise). Early in the project, target a specific group or department. Then using the process, show them it works and show them how they can be involved and how that benefits everyone. The more believers you have, the easier it is to convince others of the merits of the process. I have found that these believers are often stronger advocates for structure than even myself. They see the way it used to work and they see how much better things are once they started using the process. They understand in a very concrete way how others' jobs may be improved by the same structure. Many of these believers even begin to take attacks on the process personally, and there is no better advocate than that (not that we would ever discourage criticism of our process).

Second, strictly enforce the process for the unbelievers. Ruthlessly enforce meeting agendas, only accept feature requests at the appropriate time. In other words force the unbelievers to use the process to have anything done at all. Once you see that they understand and may start using the process (or have admitted to violating the process), start relaxing the enforcement. Show them that things work easier when they at least try to work within the formal structure. Nobody likes a fascist development team, but it is critical that you force the unbelievers to start using the process, because if you don't, they will continue to circumvent it forever.

Finally, relax the process for believers. A methodology exists to enable, not to restrict. There are certain things that should rarely be compromised. Release only on iteration boundaries, is a good example. Yet we routinely release patches if they are important enough. The reason is practical; a patch represents little risk, yet great benefit. In addition, if you are impractical in your enforcement of the process, you may start losing believers. You make exceptions for believers because they know and love the process; it is just that in this case, it broke down for them, or they made a mistake. The point is they are not trying to undermine the structure and that means they are working with you.

At the end of the day, you are just looking for people who are, for the most part, working with you, working with your process, and helping you to deliver software. For this to happen, you need to deal with the unbelievers by punishing them, rewarding those who change, and keeping your true believers happy.

Saturday, July 21, 2007

Build vs Buy For Core Business Tools

Before entering the fray, I need to mention that the argument I put forth here is tailored to a very specific question that happens to have been evaluated a number of times at the company I work for. It relates to the question of whether to buy or build a solution to automate business processes of the company. Moreover, it assumes a competent development team that is currently available. Finally, I am discussing a build vs buy decision in a small company (a hundred or so employees). While this post was motivated by a specific build vs buy decision, I only lay out arguments that are generally applicable here.

Probably the biggest reason management likes the idea of buying software is that it is a quick fix that is available today; they do not have to wait for the solution to be built in-house. This is, however, not entirely correct. The last time we tried to adopt a pre-built solution, it was six months after the purchase date that the first user began to use the software. This is because even though the software is available right away, it takes time for people (including IT) to learn the new system, adapt processes to accommodate the new system (more on this later) and most importantly to trust the system so they abandon their old process.

Time to adoption may be long, and if it is also the case (as it often is) that only a small subsection of the built software is really needed by the business (like a ticketing system), it may be possible and easier on the company to have an agile team release the software slowly to allow the business to adapt instead of switching all at once to a pre-made solution. It may even turn out that the time to build with concurrent adoption is equal to the time to adopt the 3rd-party system.

This leads directly to the question of how much development work will be required in both cases. There is no way anyone can tell me that a pre-built solution will not have to be customized once it is purchased. In fact, a significant amount of customization has been done on every solution we have purchased. Because this software is foreign, this may mean buying customization from the vendor (with all the lack of control that entails) or if you are lucky, customization by your own development team. In the latter case, this customization is not as cheap as customization or general development of the same complexity on an in-house solution because the developers did not infuse the foreign software with their philosophy and quality requirements. And again, the customization cost of the new software may not be significantly different to the development cost of the subsection of functionality that is really required by the business in an in-house solution.

A major reason I am a proponent of agile methodologies is that the business I work for changes requirements almost weekly, depending on the department. This can cause major problems with a pre-built solution. It could even mean constant customization of someone else's product. The flexibility of pre-built solutions is definitely questionable. This means that more often than not, the business ends up adapting to the software, and not the other way around. This leads to the long adoption time I mentioned above. This is even more of an issue if the software relates to the core of the business because the usually over-generalised software is telling the company how they should do business (how to handle support calls, how to have new customers apply, how to pay sales agents, how to sell, etc).

There is also the cost of maintenance to consider. At between 10%-20% of the cost of the software per year, this is not insignificant. The same argument about customization given above applies to maintenance as well. Developers will be more efficient maintaining their own system than someone else's, if that is even a possibility. Sometimes, you are dependent on the vendor. Even assuming they are reliable, they may not be very responsive.

Finally, and perhaps most importantly, you may lose your development team by purchasing software. The best developers do not want to do maintenance; they want to do development. If they are maintaining a purchased solution, you better hope it is high quality and built well, in a modern language (did I just cut out 85% of off-the-shelf software?), because if not, you will have a hard time attracting good developers.

For us, it seemed a no-brainer. We would end up customizing the thing anyway, it would still take 6 months before it would be in use and maintenance would still be a problem. Considering that it may take about two months to rebuild the functionality required into our already-built enterprise management system, I cannot understand why anyone would consider buying an off-the-shelf solution. Yet if we had not reminded the executives of these considerations, I may have been working on a filthy perl application, and probably looking for a new job.