The perils of testing

08 February 2014
The title is somewhat tongue-in-cheek because so many problems can be traced back to testing failure, but there is also a serious point. Testing is important, but I have seen disasters of misdirected testing an well as under-testing. Worse still, I have seen the desire & demand for a testing culture mask more serious underlying problems with tick-boxing excercises.

Why the interest?

The issue of testing is one that came close to sinking my application for my current job, because my previous company was one of those that had no formal testing procedures, and the fact that I rolled my own Python scripts for testing and debugging purposes quite likely saved the day. While I appreciated the need for testing, I was unfamiliar with automating it, and exposure to this in practice was a motive in me accepting the offer. Nevertheless I still saw systemic failures that were not out of line with what I had seen previously.

Regime failure

Fundamentally testing failures can be put down to one of three factors, and these are by and large human factors rather than technical ones, and hence non-trivial to address:

Builders make poor testers: Having the same person both create something and test it inevitably fails, because the cognitive gaps that led to the bugs also led to coinciding gaps in testing.
Lack of reference: What is to say that the test itself is correct? Without an authoritative reference, tests usually end up merely asserting whatever is actual behaviour rather than correct behaviour.
Insufficient time: More often than not, at least in smaller companies, the pressure is to hammer in extra features rather than stop and consolidate.

The last of these is easy to understand as it is a breadth vs. depth failure, but the other two I have seen in every single company I have worked in, including my current one. Of these the lack of reference I have seen time and time again, and in fact some of the philosophies I have been encouraged to follow land right on top of them, because the focus has been on procedure rather than intention.

Test-driven development

Test driven development is nice in theory, but it is disastrous in practice. For debugging it works well because there is an existing interface being presented with known-good and known-bad input that has corresponding expected output, but when writing new code it tends to cause more problems than it solves. Firstly whether you write the test-cases first or not does not get around builder-as-tester issues, and unless there is already infrastructure is already in place, writing tests first requires second-guessing the most suitable way to implement the new functionality. Some people tell me that proper test-driven development involves building tests around a prototype, and then these tests are used to build the production code, but that is a complete non-starter unless the prototyping language is a different (i.e. much higher-level) than the production code language.

Death by strangulation

The second problem is that implementing a feature and its test-cases risks cementing in any idiosyncrasies which the given developer happened to create on the day. In my experience this is usually down to no proper indication of failure cases in the given use-case stories, so a load of test-cases end up asserting what would later become incorrect behaviour. Even worse in practice is the creation of test suites that very tightly fit the program implementation, going to the extent of excluding what would turn out to be valid cases, and often making any changes a complexity minefield. Many times I have made a two-line change that took as many minutes to do, but the corpus of tests had assumptions that took the next three days to audit and rectify. Worst of all is when out of necessity the test-rig diverges from the behaviour of the production systems, as then you get into the politics of having to game the test system.

Test coverage

I think test coverage is actually quite a good metric as it has the side-effect of flagging up dead code, but the demand of a certain percentage of coverage is a rather blunt instrument laden with unintended consequences. The problem is that getting a high coverage often requires hitting the more paranoid sanity checks, focusing entirely on the headline percentages merely encourages people to strip these checks out and play to the implementation, rather than any actual cross-checking with desirable behaviour.

..and the kicker

I have been in development teams that took flak for not having testers, when it turned out that other teams with testers were the ones committing code that ended up breaking the build. In other cases too much emphasis was put on making & passing tests, whereas where effort should have gone was design foresight. The latter is not something that procedures can bring in.