Kirk Rader 1.0-SNAPSHOT
Unit testing guidelines.
Unit testing is both over-rated and under-utilized by many programming teams. While unit tests can be an important tool for assuring code quality, a unit testing policy based on a blind "percentage of code covered" metric is almost certain to do more harm than good.
Consider the following contrived class to unit test:
It is a class with only a single method, where that method consists of a single arithmetic operation. But note the multiplicity of use cases for this absurdly trivial "library class:"
Now, consider a "test suite" that consisted only of the following method:
That single method achieves 100% code coverage but tests only one of the many actual use cases. Relying only on the automatable "percentage of code covered" metric in such a case would result in a successful build while not actually testing any of the cases most likely to need verification for a division operator: correct handling of signs and zero.
A "test suite" consisting of the following single test method:
covers more use cases, but does so using a coding style that makes it very difficult to verify which uses cases are and are not covered during code reviews. This style is also extremely difficult to maintain over the life of a software component for the same reason that it is burdensome to code review. In addition, the first failed test case would cause any subsequent tests to be skipped, slowing down the edit-compile-test cycle during development.
Here is how a test suite for the
Divider class should look:
Note that there is a separate test method for each individual use case. The name of each test method makes it clear which use case that method tests. This is not only more readable and maintainable, most test runners will run all such test methods even if some of them fail.
Note how many more test methods than methods under test there are for even such an unrealistically trivial class like
Divider. Any test suite where there is a more or less 1:1 ratio between test functions and functions under test almost certainly fails to test important use cases, packs too many tests into too few test functions or, more likely, both. For another equally contrived example, see the unit tests for the
Pythagoras class described by Using Doxygen and for a more realistic example see the unit tests for the class library described by Parsing Symbolic Logic Formulas
Overlooked use cases is only one of many potential problems with "percentage of code covered" as a unit test quality metric. Another is the almost overwhelming temptation to distort a design to enable higher code coverage. Encapsulation and specialization are hallmarks of good object-oriented design. Making methods accessible only for the sake of unit testing or introducing more complexity only so as to enable the use of tools like mocking frameworks results in poor software design and implementation.
Another issue with "percentage of code covered" metrics is that not all lines of code need unit testing either because doing so would be more a test of the compiler and frameworks in use than of the application or because some logic requires some kind of set up or context that is beyond the scope of unit testing.
For example, it is simply a waste of time coding and waste of build server resources executing unit tests for value objects like Java "bean" with only setters and getters and similarly trivial logic.
Similarly, an API devoted to logging and reporting might require too much set-up and impose too much test-time overhead to be worth unit testing. In such cases, that much more effort should be expended in integration and acceptance testing.
Finally, some languages like Java require that some kinds of exceptions be declared in order to be thrown and require some explicit handling when declared. The topic of good error detection and handling is worth a tutorial on its own, but suffice it to say here in the context of unit testing that circumstances often arise in languages like Java where the programmer is required to write code to "catch" an exception that will almost certainly never be thrown in the real world or would be nearly impossible to arrange to be thrown during unit tests. Such catch handlers would always be marked as "uncovered" from the point of view of "percentage of code covered" and yet the fact that the code compiles at all shows that there is some error handling strategy in the design. Whether or not that is a good strategy is a matter that can only be verified via manual code review and is not a concern for unit testing.
protectedto package or
publicand make other changes solely for the purpose making it easier to increase code coverage
Sound familiar? The preceding is how most teams new to unit testing first start adding unit tests to their work flow.
"But wait," I hear the gentle reader cry, "what about mocking frameworks?"
Mocking frameworks can have their place in an overall unit testing strategy but must be used with caution for all the reasons already listed. If your primary reason for using a mocking framework is to increase percentage of code covered – well, you should already have a fair idea of how important or useful the author considers that particular metric to be.
In addition, the correctness of unit tests that rely on mock objects is only as good as the correctness of the mock objects' simulated behavior. If you rely on mock objects, be sure to have the mock objects peer reviewed by members of the teams responsible for developing the corresponding actual API's – or, better yet, only use mock objects supplied by the same developers responsible for developing the actual functionality for which they provide stubs.
Finally, unit tests that rely on mock objects too easily veer into the realm of integration tests, which should be performed in a separate phase of the build using different techniques from unit tests. Code which has been tested only with mock API's is not actually known to work as intended or even correctly.
To quote the Mockito website, itself:
- Do not mock types you don’t own
- Don’t mock value objects
- Don’t mock everything
- Show love with your tests!
One area in which mocking frameworks can be used to good purpose is in testing the behavior of code that relies on dependency-injection frameworks like Spring. Simple mock objects can be used to verify that the expected interfaces are correctly instantiated – i.e. that the dependencies are correctly plumbed together at the level of the framework, separate from testing the behavior of the injected dependencies, themselves.
Well designed and implemented code will be easy to test. As already noted, this does not mean making compromises to a design for the sake of unit testing. Rather, it refers to the fact that using techniques like dependency injection that form the basis of good software design also enhances testability.
For example, presentation-layer code can generally not be unit tested due to its tight coupling with UI toolkits and focus (pun intended) on visual representation. Design patterns like MVC are intended, among other things, to keep the presentation layer as thin as practical and so increases the surface area for unit testing. This can be regarded simply as a special case of dependency injection, where the models and controllers are dependencies injected into views. Even if there is no practical way to test the view code, putting the bulk of the application logic into models and controllers increases the percentage of the overall application code that is amenable to unit testing.
If you are grimly determined to use some sort of automatable unit test metric despite all of the above, there are tools and techniques that produce far more meaningful results than simple "percentage of code covered." Note, however, that these techniques require much more up-front analysis and effort to implement correctly than using a simple unit testing framework like JUNit and still require peer review to verify that they have been implemented correctly and are producing meaningful results.
Some of these techniques also are better supported by some programming languages and run-time environments than others. Mutation testing, for example, is theoretically applicable to any programming language or platform, but for practical purposes is only really supported in a general-purpose way in environments that natively support code-manipulation at run time, e.g. using the same "instrumentation" API in Java that supports aspect-oriented programming, unit-test coverage analysis etc.
There is also a basic cost/benefit trade-off to the use of such tools. It is important to carefully weigh whether the learning curve and overhead of developing and executing unit tests using such techniques is actually justified for a particular team of developers responsible for a given body of code. The hall-mark of a good unit-testing policy is that it is sufficiently light-weight to not be perceived as overly burdensome by developers nor too taxing on build server resources. It also must be guilty of a sufficiently low percentage of false results – whether positive or negative – as to avoid a false sense of security regarding the level of code quality while not wasting too much time chasing down what turn out to be non-issues.
A well-designed unit test suite typically has many seemingly "redundant" test methods for a few lines of code in order to cover multiple use cases while leaving some "housekeeping" code completely uncovered. Follow good software design and implementation principles in both your "functional" code and your test suites. Focusing on the quantity of unit tests rather than on the quality of test cases is worse than not unit testing at all due to:
The temptation to test the wrong things, in the wrong way merely to pump up coverage while ignoring essential test cases because coverage thresholds have already been met
Only* manual peer review can verify the quality of unit tests rather than their mere presence. Even using more sophisticated code coverage metrics than simple "percentage of lines / instructions covered," e.g. JaCoCo's "complexity coverage," is no substitute for substantive peer reviews for the same reasons that static analysis tools like SonarQube are not sufficient, by themselves, for a team to assure conformance to maintainability standards.