Reputation: 2549

Unit testing with random data

I've read that generating random data in unit tests is generally a bad idea (and I do understand why), but testing on random data and then constructing a fixed unit test case from random tests which uncovered bugs seems nice. However I don't understand how to organize it nicely. My question is not related to a specific programming language or to a specific unit test framework actually, so I'll use python and some pseudo unit test framework. Here's how I see coding it:

def random_test_cases():
   datasets = [
       dataset1,
       dataset2,
       ...
       datasetn
   ]
   for dataset in datasets:
       assertTrue(...)
       assertEquals(...)
       assertRaises(...)
       # and so on

The problem is: when this test case fails I can't figure out which dataset caused failure. I see two ways of solving it:

Create a single test case per dataset — the problem is load of test cases and code duplication.
Usually test framework lets us pass a message to assert functions (in my example I could do something like assertTrue(..., message = str(dataset))). The problem is that I should pass such a message to each assert, which does not look like elegant too.

Is there a simpler way of doing it?

Upvotes: 3

Answers (5)

piccolbo

Reputation: 1315

In quickcheck for R we tried to solve this problem as follows

the tests are actually pseudo-random (the seed is fixed) so you can always reproduce your tests results (barring external factors, of course)
the test function returns enough data to reproduce the error, including the assertion that failed and the data that made it fail. A convenience function, repro, called on the return value of test will land you in the debugger at the beginning of the failing assertion, with arguments set to the witnesses of the failure. If the tests are executed in batch mode, equivalent information is stored in a file and the command to retrieve it is printed in stderr. Then you can call repro as before. Whether or not you program in R, I would love to know if this starts to address you requirements. Some aspects of this solution may be hard to implement in languages that are less dynamic or don't have first class functions.

Upvotes: 0

Stefan Kanev

Reputation: 3040

I also think it's a bad idea.

Mind you, not throwing random data at your code, but having unit tests doing that. It all boils down to why you unit test in the first place. The answer is "to drive the design of the code". Random data doesn't drive the design of the code, because it depends on a very rigid public interface. Mind you, you can find bugs with it, but that's not what unit tests are about. And let me note that I'm talking about unit tests, and not tests in general.

That being said, I strongly suggest taking a look at QuickCheck. It's Haskell, so it's a bit dodgy on presentation and a bit PhD-ish on documentation, but you should be able to figure it out. I'm going to summarize how it works, though.

After you pick the code you want to test (let's say the sort() function), you establish invariants which should hold. In this examples, you can have the following invariants if result = sort(input):.

Every element in result should be smaller than or equal to the next one.
Every element in input should be present in result the same number of times.
result and input should have the same length (this is repeats the previous, but let's have it for illustration).

You encode each variant in a simple function that takes the result and the output and checks whether those invariants code.

Then, you tell QuickCheck how to generate input. Since this is Haskell and the type system kicks ass, it can see that the function takes a list of integers and it knows how to generate those. It basically generates random lists of random integers and random length. Of course, it can be more fine-grained if you have a more complex data type (for example, only positive integers, only squares, etc.).

Finally, when you have those two, you just run QuickCheck. It generates all that stuff randomly and checks the invariants. If some fail, it will show you exactly which ones. It would also tell you the random seed, so you can rerun this exact failure if you need to. And as an extra bonus, whenever it gets a failed invariant, it will try to reduce the input to the smallest possible subset that fails the invariant (if you think of a tree structure, it will reduce it to the smallest subtree that fails the invariant).

And there you have it. In my opinion, this is how you should go about testing stuff with random data. It's definitely not unit tests and I even think you should run it differently (say, have CI run it every now and then, as opposed to running it on every change (since it will quickly get slow)). And let me repeat, it's a different benefit from unit testing - QuickCheck finds bugs, while unit testing drives design.

Upvotes: 2

Gishu

Reputation: 136633

Usually the unit test frameworks support 'informative failures' as long as you pick the right assertion method.

However if everything else doesn't work, You could easily trace the dataset to the console/output file. Low tech but should work.

[TestCaseSource("GetDatasets")]
public Test.. (Dataset d)
{
   Console.WriteLine(PrettyPrintDataset(d));
   // proceed with checks
   Console.WriteLine("Worked!");
}

Upvotes: 0

Nick Stinemates

Reputation: 44150

I still think it's a bad idea.

Unit tests need to be straightforward. Given the same piece of code and the same unit test, you should be able to run it infinitely and never get a different response unless there's an external factor coming in to play. A goal contrary to this will increase maintenance cost of your automation, which defeats the purpose.

Outside of the maintenance aspect, to me it seems lazy. If you put thought in to your functionality and understand the positive as well as the negative test cases, developing unit tests are straightforward.

I also disagree with the user who shows how to do multiple tests cases inside of the same test case. When a test fails, you should be able to tell immediately which test failed and know why it failed. Tests should be as simple as you can make them and as concise/relevant to the code under test as possible.

Upvotes: 6

user1494736

Reputation: 2400

You could define tests by extension instead of enumeration, or you could call multiple test cases from a single case.

calling multiple test cases from a single test case:

MyTest()
{
    MyTest(1, "A")
    MyTest(1, "B")
    MyTest(2, "A")
    MyTest(2, "B")
    MyTest(3, "A")
    MyTest(3, "B")
}

And there are sometimes elegant ways to achieve this with some testing frameworks. Here is how to do it in NUnit:

[Test, Combinatorial]
public void MyTest(
    [Values(1,2,3)] int x,
    [Values("A","B")] string s)
{
    ...
}

Upvotes: 2

Unit testing with random data

Answers (5)

Related Questions