Reputation: 10008
If you (or your organization) aspires to thoroughly unit test your code, how do you measure the success or quality of your efforts?
Upvotes: 43
Views: 16209
Reputation: 1611
The concept of mutation testing seems promising as a way to measure (test?) the quality of your unit tests. Mutation testing basically means making small "mutations" to your production code and then seeing if any unit test fails. Small mutations typically means changing and
to or
or <
to <=
. If one or more unit tests fail it means the "mutant" was caught. If the mutant survives your unit test suite it means you missed a test. When I apply mutation testing to code with 100% line and branch coverage it typically will find a few spots where I missed tests.
See https://en.wikipedia.org/wiki/Mutation_testing for a description of the concept and links to tools.
Upvotes: 3
Reputation: 426
Mutation Testing is the foolproof way, know more of it https://pitest.org/ and at https://github.com/vmzakharov/mutate-test-kata/blob/master/MutationTestKata.pdf
Upvotes: 0
Reputation: 2145
Upvotes: 0
Reputation: 39354
An additional technique I try to use is to partition your code into two parts. I've recently blogged about it here. The short description is to maintain your production code in two sets of libraries where one set (hopefully the larger set) has 100% line coverage (or better if you can measure it) and the other set (hopefully a tiny amount of code) has 0% coverage, yes zero percent coverage.
Your designs should allow this partitioning. This should make it easy to see the code that is not covered. Over time you may have ideas about how to move code from the smaller set to the larger set.
Upvotes: 1
Reputation: 41
Monitoring code coverage rates can be useful but instead of focusing on an arbitrary target rate (80 %, 90 %, 100 % ?) I have found it useful to aim for a positive trend over time.
Upvotes: 3
Reputation: 5506
I think some best practices for unit tests are:
Do not expect to reach 100% code coverage unless you develop mission critical software. It can be very costly to reach this level and will for most projects not be worth the effort.
Upvotes: 3
Reputation: 19795
If your primary way of measuring test quality is some automated metric, you've already failed.
Metrics can be misleading, and they can be gamed. And if the metric is the primary (or worse yet, only) means of judging quality they will be gamed (perhaps unintentionally).
Code coverage, for example, is deeply misleading because 100% code coverage is nowhere near complete test coverage. Also, a figure like "80% code coverage" is just as misleading without context. If that coverage is in the most complex bits of code and just misses the code which is so simple it's easy to verify by eye then that's significantly better than if that coverage is biased in the reverse way.
Also, it's important to distinguish between the test-domain of a test (it's feature-set, essentially) and its quality. Test quality is not determined by how much it tests just as code quality isn't determined by a laundry list of features. Test quality is determined by how well a test does its job in testing. That's actually very difficult to sum up in an automated metric.
The next time you go to write a unit test, try this experiment. See how many different ways you can write it such that it has the same code coverage and tests the same code. See whether its possible to write a very poor test that meets these criteria and a very good test as well. I think you may be surprised at the results.
Ultimately there's no substitute for experience and judgment. A human eye, hopefully several eyes, needs to look at the test code and decide if it's good or not.
Upvotes: 6
Reputation: 10829
I am very much pro-TDD, but I don't place much importance in coverage stats. To me, the success and usefulness of unit tests is felt over a period of development time by the development team, as the tests (a) uncover bugs up front, (b) enable refactoring and change without regression, (c) help flesh out modular, decoupled design, (d) and whatnot.
Or, as Martin Fowler put it, the anecdotal evidence in support of unit tests and TDD is overwhelming, but you cannot measure productivity. Read more on his bliki here: http://www.martinfowler.com/bliki/CannotMeasureProductivity.html
Upvotes: 12
Reputation: 7257
To attain a full measure of confidence in your code you need different levels of testing: unit, integration and functional. I agree with the advice given above that states that testing should be automated (continuous integration) and that unit testing should cover all branches with a variety of edge case datasets. Code coverage tools (e.g. Cobertura, Clover, EMMA etc) can identify holes in your branches, but not in the quality of your test datasets. Static code analysis such as FindBugs, PMD, CPD can identify problem areas in your code before they become an issue and go a long way towards promoting better development practices.
Testing should attempt to replicate the overall environment that the application will be running in as much as possible. It should start from the simplest possible case (unit) to the most complex (functional). In the case of a web application, getting an automated process to run through all the use cases of your website with a variety of browsers is a must so something like SeleniumRC should be in your toolkit.
However, software exists to meet a business need so there is also testing against requirements. This tends to be more of a manual process based on functional (web) tests. Essentially, you'll need to build a traceability matrix against each requirement in the specification and the corresponding functional test. As functional tests are created they are matched up against one or more requirements (e.g. Login as Fred, update account details for password, logout again). This addresses the issue of whether or not the deliverable matches the needs of the business.
Overall, I would advocate a test driven development approach based on some flavour of automated unit testing (JUnit, nUnit etc). For integration testing I would recommend having a test database that is automatically populated at each build with a known dataset that illustrates common use cases but allows for other tests to build on. For functional testing you'll need some kind of user interface robot (SeleniumRC for web, Abbot for Swing etc). Metrics about each can easily be gathered during the build process and displayed on the CI server (eg Hudson) for all developers to see.
Upvotes: 9
Reputation: 60556
My tip is not a way to determine whether you have good unit tests per se, but it's a way to grow a good test suite over time.
Whenever you encounter a bug, either in your development or reported by someone else, fix it twice. You first create a unit test that reproduces the problem. When you've got a failing test, then you go and fix the problem.
If a problem was there in the first place it's a hint about a subtlety about the code or the domain. Adding a test for it lets you make sure it's never going to be reintroduced in the future.
Another interesting aspect about this approach is that it'll help you understand the problem from a higher level before you actually go and look at the intricacies of the code.
Also, +1 for the value and pitfalls of test coverage already mentioned by others.
Upvotes: 39
Reputation: 1262
Code coverage is a useful metric but should be used carefully. Some people take code coverage, specially the percentage covered, a bit too seriously and see it as THE metric for good unit testing.
My experience tells me that more important than trying to get 100% coverage, which is not that easy, people should focus on checking the critical sections are covered. But even then you may get false positives.
Upvotes: 14
Reputation: 2924
I normally do TDD, so I write the tests first, which helps me see how I want to be able to use the objects.
Then, when I'm writing the classes, for the most part I can spot common pitfalls (i.e. assumptions that I'm making, e.g. a variable being of a particular type, or range of values) and when these come up I write a specific test for that specific case.
Aside from that, and getting as good as code coverage as possible (sometimes it's not possible to get 100%), you're more or less done. Then, if any bugs do come up in the future, you just make sure you write a test case for it that exposes it first, and will pass when fixed. Then fix as per normal.
Upvotes: 5
Reputation: 405715
Code coverage is to testing as testing is to programming. It can only tell you when there is a problem, it can't tell you when everything works. You should have 100% code coverage and beyond. Branches of code logic should be tested with several input values, fully exercising normal, edge, and corner cases.
Upvotes: 6
Reputation: 8109
If it can break, it should be tested. If it can be tested, it should be automated.
Upvotes: 8