Make Those Tough Choices… Later

When I was just starting my career, I worked on a system that was designed from the beginning to be fully distributed. Big Design Up Front in full effect. The system’s job was to parse huge batches of XML files, reconcile that data with an existing data model, and then execute a series of business rules on the results. The whole thing was divided up into components that would communicate asynchronously through JMS. We will be able to scale everything independently! Everything is decoupled!

Even today, building out the infrastructure of such a system takes serious effort. Back in those ancient times, it was considerably harder, and probably 50% or more of the development effort went into infrastructure work. Huge amounts of work went into managing the deployment of the system. Hours and hours spend managing dependencies. Lots of diagrams drawn and long explanations of how everything fit together.

In the olden days, CI was Ant scripts, baby!

The initial deployment of the system was done on just one machine for the sake of simplicity. We ran load tests on it. The single machine configuration easily handled the estimated max load. It went into production on a single machine, and when I checked in almost 8 years later, it was still running on a single machine. In all likelihood, literally the same physical machine. How quaint!

So all that time and energy (AKA money) spent on complicated asynchronous messaging infrastructure? Wasted. The better approach would have gone something like this: Let’s focus on the core domain logic first. Let’s build a full path for a subset of our functionality, so we can take some part of our XML data, parse and process it and get a result. We’ll still be sure to keep things decoupled, but we’ll just do it through good software design principals¹. We’ll build out the simplest possible infrastructure to support it.

OK, now we can think about our tough decision of how to deploy this. The difference is now we have a lot more information. For one thing, we can do a reasonably accurate load test. Since we are only processing, say, 5% of the XML, just use 20 times more input. So we do that and find our system can handle it fine. No need to go through all that pain of distributing things! Let’s keep things simple, and keep running our load tests every so often as we build out more functionality. If we get some bad load test results, then we can start thinking about distributing² (after we look for easy-to-solve bottlenecks first)! And since we designed our software well, making the change to a distributed architecture is no big deal.

The other advantage of this approach is that we get something out in front of the customer much sooner. I’ve seen other projects that have burned a bunch of time early on setting up all kinds of infrastructure only to find out, when they finally got the product in front of some customers, that they went in entirely in the wrong direction.

I want to sleep well at night. Some things that make me sleep well:

Clean, well designed, well tested code
Hard numbers to back up big decisions
A customer telling me they like what they see

Number one you can always do. Two and Three work best when you wait!

¹ This means designing things so things that we may need to change in the future are easy to change.

² Distributing and scaling out is not always the best solution. Before diving into the effort to distribute the system, you need to think - can we get away with just putting the whole thing on faster machine? Is being able to scale parts of the system independently really going to save us money? Virtual hardware is cheap! Developers are expensive! Fodder for another blog post :)