Kirk Rader  1.0-SNAPSHOT
Unit Testing Guidelines

Unit testing guidelines.

Overview

Unit testing is both over-rated and under-utilized by many programming teams. While unit tests can be an important tool for assuring code quality, a unit testing policy based on a blind "percentage of code covered" metric is almost certain to do more harm than good.

Consider the following contrived class to unit test:

// Mock class under test
/**
* Mock class used to demonstrate unit testing best practices.
*/
public class Divider {
/**
* Divide numerator by denominator.
*
* This contrived class defines a single method with only a single line.
* Note, however, that the single line represents a multitude of use cases,
* including:
*
* * Dividing a number by itself should always return 1.0
*
* * Operands of the same magnitude but opposites signs should always return
* -1.0
*
* * Division by 0.0 should produce an infinite number of the same sign as
* the numerator
*
* @param numerator
* The numerator.
*
* @param denominator
* The denominator.
*
* @return The quotient.
*/
public float divide(final float numerator, final float denominator) {
return numerator / denominator;
}
}

It is a class with only a single method, where that method consists of a single arithmetic operation. But note the multiplicity of use cases for this absurdly trivial "library class:"

Now, consider a "test suite" that consisted only of the following method:

// Don't do this!
/**
* Counter example of a "test suite" that demonstrates 100% code coverage but
* completely inadequate testing strategy.
*
* This shows one of the many defects with code coverage as a unit test quality
* metric.
*/
public class BadDivider1Test {
/**
* Hugely inadequate test plan that still achieves 100% code coverage. One
* of the many reasons that code coverage metrics are counter-productive, in
* this case through giving a false sense of security.
*/
@Test
public void divide1Test() {
final Divider divider = new Divider();
assertTrue(0.5f - divider.divide(1.0f, 2.0f) < Float.MIN_VALUE);
}
}

That single method achieves 100% code coverage but tests only one of the many actual use cases. Relying only on the automatable "percentage of code covered" metric in such a case would result in a successful build while not actually testing any of the cases most likely to need verification for a division operator: correct handling of signs and zero.

A "test suite" consisting of the following single test method:

// Don't do this, either!
/**
* Another counter-example of a "test suite" that achieves adequate use case
* coverage but using extremely poor coding style.
*/
public class BadDivider2Test {
/**
* This style makes it very difficult for a code reviewer to tell what use
* cases are and are not covered and the first failed test skips any
* remaining ones. This is another of the many reasons that code coverage
* metrics are counter-productive, in this case due to fragile code that is
* difficult to understand or maintain and yet still passes automated build
* checks.
*/
@Test
public void divide2Test() {
final Divider divider = new Divider();
assertTrue(0.5f - divider.divide(1.0f, 2.0f) < Float.MIN_VALUE);
assertTrue(-0.5f - divider.divide(1.0f, -2.0f) < Float.MIN_VALUE);
assertTrue(1.0f - divider.divide(10.0f, 10.0f) < Float.MIN_VALUE);
assertTrue(1.0f - divider.divide(-10.0f, -10.0f) < Float.MIN_VALUE);
assertTrue(-1.0f - divider.divide(10.0f, -10.0f) < Float.MIN_VALUE);
assertTrue(-1.0f - divider.divide(-10.0f, 10.0f) < Float.MIN_VALUE);
assertTrue(Float.POSITIVE_INFINITY == divider.divide(10.0f, 0.0f));
assertTrue(Float.NEGATIVE_INFINITY == divider.divide(-10.0f, 0.0f));
}
}

covers more use cases, but does so using a coding style that makes it very difficult to verify which uses cases are and are not covered during code reviews. This style is also extremely difficult to maintain over the life of a software component for the same reason that it is burdensome to code review. In addition, the first failed test case would cause any subsequent tests to be skipped, slowing down the edit-compile-test cycle during development.

Here is how a test suite for the Divider class should look:

// Do this, instead!
/**
* Unit tests for {@link Divider}.
*
* This class defines a number of test methods comprising a good unit test suite
* for a contrived piece of code under test.
*/
public class GoodDividerTest {
// Single-use-case test methods, whose names and comment blocks clearly
// denote their purpose. This makes it far easier for a code reviewer to
// understand and assess the test plan, for subsequent developers to update
// the tests when the code under test changes etc. Note that most test
// runners will attempt to run all of these tests even if any of them fail,
// thus producing more actionable information on each test run.
/**
* Test that -f / -f is 1.0.
*
* This is a good example of a single unit test method from a comprehensive
* test suite.
*/
@Test
public void negativeOverNegativeTest() {
final Divider divider = new Divider();
assertTrue(1.0f - divider.divide(-10.0f, -10.0f) < Float.MIN_VALUE);
}
/**
* Test that f / -f is -1.0.
*
* This is a good example of a single unit test method from a comprehensive
* test suite.
*/
@Test
public void negativeOverPositiveTest() {
final Divider divider = new Divider();
assertTrue(-1.0f - divider.divide(-10.0f, 10.0f) < Float.MIN_VALUE);
}
/**
* Test that -f / 0.0 is negative infinity.
*
* This is a good example of a single unit test method from a comprehensive
* test suite.
*/
@Test
public void negativeOverZeroTest() {
final Divider divider = new Divider();
assertTrue(Float.NEGATIVE_INFINITY == divider.divide(-10.0f, 0.0f));
}
/**
* Test that 1.0 / -2.0 is -0.5.
*
* This is a good example of a single unit test method from a comprehensive
* test suite.
*/
@Test
public void negativeQuotientTest() {
final Divider divider = new Divider();
assertTrue(-0.5f - divider.divide(1.0f, -2.0f) < Float.MIN_VALUE);
}
/**
* Test that f / -f is -1.0.
*
* This is a good example of a single unit test method from a comprehensive
* test suite.
*/
@Test
public void positiveOverNegativeTest() {
final Divider divider = new Divider();
assertTrue(-1.0f - divider.divide(10.0f, -10.0f) < Float.MIN_VALUE);
}
/**
* Test that f / f is 1.0.
*
* This is a good example of a single unit test method from a comprehensive
* test suite.
*/
@Test
public void positiveOverPositiveTest() {
final Divider divider = new Divider();
assertTrue(1.0f - divider.divide(10.0f, 10.0f) < Float.MIN_VALUE);
}
/**
* Test that f / 0.0 is positive infinity.
*
* This is a good example of a single unit test method from a comprehensive
* test suite.
*/
@Test
public void positiveOverZeroTest() {
final Divider divider = new Divider();
assertTrue(Float.POSITIVE_INFINITY == divider.divide(10.0f, 0.0f));
}
/**
* Test that 1.0 / 2.0 is 0.5.
*
* This is a good example of a single unit test method from a comprehensive
* test suite.
*/
@Test
public void positiveQuotientTest() {
final Divider divider = new Divider();
assertTrue(0.5f - divider.divide(1.0f, 2.0f) < Float.MIN_VALUE);
}
}

Note that there is a separate test method for each individual use case. The name of each test method makes it clear which use case that method tests. This is not only more readable and maintainable, most test runners will run all such test methods even if some of them fail.

Note
The same single line of code is executed many times during each test run in order to test multiple use cases rather than testing the minimal number of use cases necessary to achieve some particular level of coverage. Rather than being regarded as "burdensome" or "redundant" this is essential to the purpose of unit testing. Any developer unwilling to expend the level of effort necessary to unit test in this style should not write any unit tests at all since doing so is not merely a waste of time but will do more harm than good in the long run.

Note how many more test methods than methods under test there are for even such an unrealistically trivial class like Divider. Any test suite where there is a more or less 1:1 ratio between test functions and functions under test almost certainly fails to test important use cases, packs too many tests into too few test functions or, more likely, both. For another equally contrived example, see the unit tests for the Pythagoras class described by Using Doxygen and for a more realistic example see the unit tests for the class library described by Parsing Symbolic Logic Formulas

Note
Complete use-case coverage will generally take care of code coverage on its own. For example, if you have an uncovered private method after properly exercising all use cases, what is that method actually for?

Overlooked use cases is only one of many potential problems with "percentage of code covered" as a unit test quality metric. Another is the almost overwhelming temptation to distort a design to enable higher code coverage. Encapsulation and specialization are hallmarks of good object-oriented design. Making methods accessible only for the sake of unit testing or introducing more complexity only so as to enable the use of tools like mocking frameworks results in poor software design and implementation.

Another issue with "percentage of code covered" metrics is that not all lines of code need unit testing either because doing so would be more a test of the compiler and frameworks in use than of the application or because some logic requires some kind of set up or context that is beyond the scope of unit testing.

For example, it is simply a waste of time coding and waste of build server resources executing unit tests for value objects like Java "bean" with only setters and getters and similarly trivial logic.

Note
Exactly such trivial logic was used for the examples in this document only for tutorial purposes, not because such an extraordinarily simple "library class" would actually exist or need unit testing in the real world.

Similarly, an API devoted to logging and reporting might require too much set-up and impose too much test-time overhead to be worth unit testing. In such cases, that much more effort should be expended in integration and acceptance testing.

Warning
Beware of integration tests masquerading as unit tests!

Finally, some languages like Java require that some kinds of exceptions be declared in order to be thrown and require some explicit handling when declared. The topic of good error detection and handling is worth a tutorial on its own, but suffice it to say here in the context of unit testing that circumstances often arise in languages like Java where the programmer is required to write code to "catch" an exception that will almost certainly never be thrown in the real world or would be nearly impossible to arrange to be thrown during unit tests. Such catch handlers would always be marked as "uncovered" from the point of view of "percentage of code covered" and yet the fact that the code compiles at all shows that there is some error handling strategy in the design. Whether or not that is a good strategy is a matter that can only be verified via manual code review and is not a concern for unit testing.

How Not to Write Unit Tests

Do not:

  1. Completely implement your software component
  2. Write unit tests that pass based solely on inspecting the code as written
  3. Where necessary, change the accessibility of existing methods from private or protected to package or public and make other changes solely for the purpose making it easier to increase code coverage
  4. Stop once you have fulfilled your percentage of code covered requirement

Sound familiar? The preceding is how most teams new to unit testing first start adding unit tests to their work flow.

Warning
If the preceding is your approach to unit testing you would be far better off not unit testing at all for reasons discussed below.

Unit Testing Guidelines

Instead:

  1. Write unit tests according to well-defined use cases before or in parallel with the code under test – if in parallel, the unit tests should be written by a different developer than the code under test
  2. Use the same principles to guide the design and implementation of unit tests as for "functional" code – i.e. each test method should focus on a single well-defined use case, use naming conventions that make each test's purpose clear etc.
  3. Each individual unit test method should take only a tiny fraction of a second to execute on the build server – elaborate tests that require a lot of time and resources are probably integration tests rather than unit tests to start with
  4. Peer review unit tests just as stringently as "functional" code to verify adequate coverage of use cases – in which case automated "percentage of code covered" metrics become moot

Mocking Frameworks

"But wait," I hear the gentle reader cry, "what about mocking frameworks?"

Mocking frameworks can have their place in an overall unit testing strategy but must be used with caution for all the reasons already listed. If your primary reason for using a mocking framework is to increase percentage of code covered – well, you should already have a fair idea of how important or useful the author considers that particular metric to be.

In addition, the correctness of unit tests that rely on mock objects is only as good as the correctness of the mock objects' simulated behavior. If you rely on mock objects, be sure to have the mock objects peer reviewed by members of the teams responsible for developing the corresponding actual API's – or, better yet, only use mock objects supplied by the same developers responsible for developing the actual functionality for which they provide stubs.

Finally, unit tests that rely on mock objects too easily veer into the realm of integration tests, which should be performed in a separate phase of the build using different techniques from unit tests. Code which has been tested only with mock API's is not actually known to work as intended or even correctly.

To quote the Mockito website, itself:

Remember

  • Do not mock types you don’t own
  • Don’t mock value objects
  • Don’t mock everything
  • Show love with your tests!

One area in which mocking frameworks can be used to good purpose is in testing the behavior of code that relies on dependency-injection frameworks like Spring. Simple mock objects can be used to verify that the expected interfaces are correctly instantiated – i.e. that the dependencies are correctly plumbed together at the level of the framework, separate from testing the behavior of the injected dependencies, themselves.

Design for Testability

Well designed and implemented code will be easy to test. As already noted, this does not mean making compromises to a design for the sake of unit testing. Rather, it refers to the fact that using techniques like dependency injection that form the basis of good software design also enhances testability.

For example, presentation-layer code can generally not be unit tested due to its tight coupling with UI toolkits and focus (pun intended) on visual representation. Design patterns like MVC are intended, among other things, to keep the presentation layer as thin as practical and so increases the surface area for unit testing. This can be regarded simply as a special case of dependency injection, where the models and controllers are dependencies injected into views. Even if there is no practical way to test the view code, putting the bulk of the application logic into models and controllers increases the percentage of the overall application code that is amenable to unit testing.

Fault Injection, Mutation Testing Etc.

If you are grimly determined to use some sort of automatable unit test metric despite all of the above, there are tools and techniques that produce far more meaningful results than simple "percentage of code covered." Note, however, that these techniques require much more up-front analysis and effort to implement correctly than using a simple unit testing framework like JUNit and still require peer review to verify that they have been implemented correctly and are producing meaningful results.

Some of these techniques also are better supported by some programming languages and run-time environments than others. Mutation testing, for example, is theoretically applicable to any programming language or platform, but for practical purposes is only really supported in a general-purpose way in environments that natively support code-manipulation at run time, e.g. using the same "instrumentation" API in Java that supports aspect-oriented programming, unit-test coverage analysis etc.

There is also a basic cost/benefit trade-off to the use of such tools. It is important to carefully weigh whether the learning curve and overhead of developing and executing unit tests using such techniques is actually justified for a particular team of developers responsible for a given body of code. The hall-mark of a good unit-testing policy is that it is sufficiently light-weight to not be perceived as overly burdensome by developers nor too taxing on build server resources. It also must be guilty of a sufficiently low percentage of false results – whether positive or negative – as to avoid a false sense of security regarding the level of code quality while not wasting too much time chasing down what turn out to be non-issues.

Warning
Again, beware the slippery slope from unit testing to integration or user acceptance testing! Leave the latter to the experts in QA!

Summary

A well-designed unit test suite typically has many seemingly "redundant" test methods for a few lines of code in order to cover multiple use cases while leaving some "housekeeping" code completely uncovered. Follow good software design and implementation principles in both your "functional" code and your test suites. Focusing on the quantity of unit tests rather than on the quality of test cases is worse than not unit testing at all due to: