Handling the Ball of Mud
Or, what to do when your organically grown Perl application gets too big for its (and your) own good
Fred Moyer
Red Hot Penguin Consulting LLC
Who the heck am I?
- Hacking Perl for about 7 years
- Mechanical Engineering and Computer Science background
- Worked for startups as well as big corporations
- I've created balls of mud
- I've worked on balls of mud
[any material that should appear in print but not on the slide]
What's a Ball of Mud?
- A system that has no distinguishable architecture
- http://en.wikipedia.org/wiki/Big_ball_of_mud
- Common in large, rapidly developed applications
- Most common in startups, but is found in a lot of Perl codebases
- Cost, Quality, Time - pick two
- Cost (scalar(@engineers)) is fixed
- Time is usually fixed (needs to be done right now!)
- So Quality must give
- Systems are large in complexity of business logic, not necessarily large in LoC
[any material that should appear in print but not on the slide]
The Startup
- Startups are great places for creating balls of mud
- You and your huge development team (usually two to three engineers if you are really lucky) can try out new ideas fast
- In a startup, you have to constantly adapt to rapidly changing business requirements
- If you don't adapt, your competitors will destroy you, or you will run out of money quickly
- Once ideas start sticking, the application starts to generate revenue
- Great for the business
[any material that should appear in print but not on the slide]
The tools group of an organization
- Usually one experienced Perl developer and a number of other technical specialists
- You're the expert in Perl
- But you're not the only one writing the code
- When there's a problem though, you're the one that gets called
- You spend a lot of time maintaining code
[any material that should appear in print but not on the slide]
Evolution of the Ball of Mud
- Development is driven by organizational needs
- Not always enough information or time to make perfect decisions
- Everything has a purpose at some point in the codebase lifetime
- The Perfect is the Enemy of the Good Enough
- When a piece of code is written that gets the job done, it's usually considered 'ready for production'
[any material that should appear in print but not on the slide]
Evolution of the Ball of Mud
- Perl allows product management, a technical specialist, or a programmer with a new idea to write a proof of concept without risking a whole lot
- Rarely do you have time to architect a perfect solution
- One of Perl's greatest strengths is that you can generate prototypes very quickly
- Another of Perl's greatest strengths is that those prototypes can be optimized to perform very well without needing to be replaced by a C program
[any material that should appear in print but not on the slide]
Evolution of the Ball of Mud
- So you've got a large codebase, and it's ugly, but it works
- And it's mission critical
- And forward development is expected to continue at a steady pace, or increase
- "Why do we need to do this refactoring thing, everything works just fine?"
- An analogy I use here is to explain that your codebase needs maintenance, just like your car needs regular oil changes, or the airplanes that you fly in need regular maintenance
[any material that should appear in print but not on the slide]
Evolution of the Ball of Mud
- The maintenance cost of the app is a visible concern (software fire drills)
- When something breaks it needs to be fixed fast>
- The test suite doesn't pass completely anymore
- And you have to also focus on scaling the system since you are experiencing success
- You look at the code you wrote six months ago and shake your head
[any material that should appear in print but not on the slide]
Evolution of the Ball of Mud
- You now have a ball of mud
- The good news is that it generates revenue, usually a lot of revenue
- Or it makes your team much needed by the organization
- The bad news is that it's higher maintenance than Paris Hilton
- The other bad news is that when new developers touch it, they break it
- The other other bad news is that you sometimes break it when you touch it
[any material that should appear in print but not on the slide]
Here comes Downtime
- How much does downtime cost you?
- You have dozens or hundreds of servers running this code
- You push out a code change, and something goes wrong
- Assuming a 100 person company with operational costs roughly 1 million dollars per month
- So one hour of downtime costs 1.000.000/30/8, about $4.200
- The real cost is much higher though in terms of morale and lasting impact
[any material that should appear in print but not on the slide]
How to Handle the Ball of Mud
- Now YOU have to work on this thing
- Test the code before you touch it
- No test? Write one!
- "It's too hard to test!"
- Just step up and write the test
[any material that should appear in print but not on the slide]
How to Handle the Ball of Mud
- Follow existing test design patterns
- Don't rearchitect yet
- Ok to fill in gaps with a new type of test
- Hard to test user facing functionality, easier to test a module API
- Big problems are hard to solve, small problems are easy, so look at your system as lots of small problems instead of one big one
- Write lots of small tests as opposed to a few big tests
[any material that should appear in print but not on the slide]
Frameworks are not silver bullets...
- "Let's use a framework to add some order. It will fix all our problems!"
- Be very careful of this approach, it's a lot more difficult than you think
- Only a good approach when you have really outstanding test coverage, and you are sure that the framework fits your application design patterns
[any material that should appear in print but not on the slide]
Refactoring is a Winchester pump...
- Refactoring can be your single biggest friend
- It is paying off technical debt
- Management asks "What value do we gain here?"
- The value proposition is not so much what you gain but what you don't lose
- Refactor _away_ from the big ball of mud, not _towards_ it
[any material that should appear in print but not on the slide]
The N-thousand line program
- Very difficult to figure out if it's doing the right thing
- Hard to automate, but not impossible
- You can't just move all the subroutines into modules without breaking something
- So setup scaffolding around the program
[any material that should appear in print but not on the slide]
The N-thousand line program
use Test::More tests => 3;
my $class = 'program';
require_ok("$class.pl");
import $class qw( addition should_die);
no strict 'refs';
${"main\:\:var"} = 5; # override globals
my $method = 'addition';
cmp_ok($method->(1,1), '==', 2, '1+1 returns 2');
$method = 'should_die';
eval { $method->() };
like($@, qr/oops/i, 'should_die() died');
[any material that should appear in print but not on the slide]
The N-thousand line program
- With a test in place you can port the subroutines into modules, but watch out for globals!
- program.pl should refactor down to a command line interface that wraps a module
- Look at qpsmtpd for a good example (http://wiki.qpsmtpd.org)
use strict;
use Getopts::Long;
# process arguments into %args
use My::Module;
my $runner = My::Module->new(\%args);
$runner->run();
[any material that should appear in print but not on the slide]
The test suite that runs forever
- As the codebase grows fast, you try to keep up with testing
- More tests equals more time to run the test suite before checkins
- At some point, you hit a piece of code that's hard to test so you write a bad test
- Or you have to make a small code change and don't want to run the entire test suite
- Whatever the case, the test suite is a problem now, instead of a solution
[any material that should appear in print but not on the slide]
The test suite that runs forever
- The core issue here is that the problem is too big to handle
- So you need to break it up into smaller, more manageable pieces
- If only there was a good design pattern to follow here
- But there is, and you use it every day! (CPAN)
- Trying to fix a really big testing suite while continuing forward feature development is usually a losing battle
- The CPAN approach has proven scalable ( more than 10k modules at last count)
[any material that should appear in print but not on the slide]
MyPAN
- Identify parts of your code that are loosely coupled
- My::Log and My::Config are good starting points
- My::Model is another good candidate
- 'h2xs -X' is your friend, start simple
- If you can split your one big app into a few medium sized apps, the test suites for each become much more manageable
- Identify entrenched design patterns in your code and create modules from those
[any material that should appear in print but not on the slide]
MyPAN
- Most large applications contain the following components in some form:
- One or more web based applications (mod_perl, etc.)
- Standalone job processing daemons
- A database abstraction layer
- Libraries which are extensions to C code
- In the early days of the application, it's hard to see the lines between different functional sections because you are focused on delivering features
- Take your next feature request, and extract the functionality from the main codebase into a module
[any material that should appear in print but not on the slide]
MyPAN
- Use your operating system's packaging system to deploy code releases
- ppm, rpm, ports, ebuilds, etc...
- Check out Ovid on CPAN for rpm generation
- Creating your spec file, ebuild, etc. is part of development
- Most shops use a version control system for deployment
- That works well for a few servers, but not for a few dozen or a few hundred
- No need to push out a new release with 100k lines of code for a change to one job processing daemon
[any material that should appear in print but not on the slide]
Social Engineering
- Programming teams are complex social dynamic systems
- It's natural for code authors to become emotionally attached to their code
- How do you explain to your coworker who has a Ph.D. in Astrophysics that you had to add 'use strict;' to his code when you fixed a bug the week he was out?
- Better yet, how to you get him to add strictures to his code in the future?
[any material that should appear in print but not on the slide]
Social Engineering
- Promote ego-less programming practices
- Use Changes and README files to credit authors, but put an email list alias in the actual POD
- Lead by example, not by persuasion
- If someone writes some code that looks dodgy, write a test for it rather than touching their code
- Then when you find a bug, send them a polite email with a well written fix, and an example of the test passing under your change
[any material that should appear in print but not on the slide]
Social Engineering
- Code reviews don't always work
- Chances are you are the Perl expert in the organization, so you have to remain humble
- What's easy for you might be hard for others
- Focus on stabilizing the code base through test writing and you'll win people over
- Bring Perl::Tidy and Perl::Critic into the codebase slowly, starting with your own code
[any material that should appear in print but not on the slide]
Tools for success
- Version control - use a system that preserves code change history across file moves (CVS is not one of them)
- Persistence testing environment - the worst failures occur going from qa to production, not dev to qa
- Regression testing environment - http://sourceforge.net/projects/smolder
- Coffee, beer, cigarettes - convince your coworkers that life doesn't have to be full of software fire drills
[any material that should appear in print but not on the slide]
Credits
- Those who had to suffer through the balls of mud I created
- Those who created balls of mud which I suffered through and gained wisdom
- http://use.perl.org/~schwern/journal - paste-archives of mini-essays
- The great programmers I've had a chance to work with and who have shown me what writing good code is all about
- The creator of Perl::Critic for introducing me to the term 'ball of mud'
[any material that should appear in print but not on the slide]
Thank you NPW 2007!
- These slides available at http://www.redhotpenguin.com/talks/npw2007/ball_of_mud.html
- Questions?
[any material that should appear in print but not on the slide]
Shameless Plug
Need mod_perl / Perl / PostgreSQL consulting?
fred@redhotpenguin.com
just another mod_perl hacker
[any material that should appear in print but not on the slide]