devoured elysium
devoured elysium

Reputation: 105067

Removing duplication in Test Driven Development

I am developing a small parser class TDD style. Here are my tests:

    ...

    [TestMethod]
    public void Can_parse_a_float() {
        InitializeScanner("float a");
        Token expectedToken = new Token("float", "a");

        Assert.AreEqual(expectedToken, scanner.NextToken());
    }

    [TestMethod]
    public void Can_parse_an_int() {
        InitializeScanner("int a");
        Token expectedToken = new Token("int", "a");

        Assert.AreEqual(expectedToken, scanner.NextToken());            
    }

    [TestMethod]
    public void Can_parse_multiple_tokens() {
        InitializeScanner("int a float b");

        Token firstExpectedToken = new Token("int", "a");
        Token secondExpectedToken = new Token("float", "b");

        Assert.AreEqual(firstExpectedToken, scanner.NextToken());
        Assert.AreEqual(secondExpectedToken, scanner.NextToken());
    }

What is bugging me is that the last test is exercising the same lines of code that both Can_parse_a_float() and Can_parse_an_int(). On one hand, it is exercising something that both those methods aren't: that from a source code string, I can get several tokens. On the other hand, were Can_parse_a_float() and Can_parse_an_int() fail, Can_parse_multiple_tokens() would fail too.

I feel there are 4 goals at stake here:

I am offering cookies to anyone who shares his opinions on how to better approach this scenario. Thanks!

Upvotes: 3

Views: 658

Answers (3)

tallseth
tallseth

Reputation: 3675

This has happened to me occasionally also. If you stick to the fail-pass-refactor cycle, sometimes you find out that two tests actually turn in to the same one in terms of the code and logic that they exercise.

The appropriate response to this situation varies, but becomes clearer if you think of the tests as a technical specification rather than a test. If the test communicates something unique about what the class under test should do, than keep it. If it's a duplicate specification, delete it. I've often kept tests like this around and had a later change to behavior cause them to become unique from a code path perspective again. If I had decided to drop that specification, I might have missed the new code path later.

The real unit of vale your third test provides is describing and insisting upon correct multi-token behavior. This seems like a unique specification in that set of tests to me, so I would keep it.

Upvotes: 1

xpmatteo
xpmatteo

Reputation: 11408

In my opinion, your concern shows that there is a design problem. You have two separate responsibilities:

  • Transforming a prefix of the input string to a token
  • Repeating the above operation keeping track of "where I am" in the input string

Your current design assigns these two to a single Parser object. This is why you can't test the two responsibilities separately. A better design would be to define a Tokenizer for the first responsibility and a Parser for the second. Sorry I'm too rusty on C# so I will write Java.

One design problem in the tokenizer is that it should return two values: the token, and how much of the string was consumed. In Java Strings are immutable, so I can't change the input argument. I would use a StringReader instead, which is a stream of characters that you can consume. My first test could be:

@Test public void tokenizes_an_integer() {
    Tokenizer tokenizer = new Tokenizer();
    StringReader input = new StringReader("int a rest of the input");

    Token token = tokenizer.tokenize(input);

    assertEquals(new Token("a", "int"), token);
    assertEquals(" rest of the input", input.toString());
}

When this passes, I could write a test for the second responsibility:

@Test public void calls_tokenizer_repeatedly_consuming_the_input() {
    StringReader input = new StringReader("int a int b");
    Parser parser = new Parser(input, new Tokenizer());

    assertEquals(new Token("a", "int"), parser.nextToken());
    assertEquals(new Token("b", "int"), parser.nextToken());
    assertEquals(null, parser.nextToken());
}

This is better, but still is not perfect from the point of view of test maintainability. If you decide to change the syntax of the "int" token, both tests will break. Ideally you would like only the first one to break. One solution would be to use a fake tokenizer in the second test, that does not depend on the real one.

This is something I'm still trying to master. One useful resource is the "Growing Object-Oriented Software" book, which is very good on test independence and expressivity.

Where's the cookies? :-)

Upvotes: 3

tvanfosson
tvanfosson

Reputation: 532455

So, my question is -- did you do the simplest thing possible when you wrote code that passed the first two tests or did you work ahead, knowing about the third requirement (test)? If you wrote the simplest code possible and it passed the third test without writing new code, that test wasn't necessary. If you had to modify the code, then the third test was needed and served to define the code. Yes, they are now all three exercising the same lines of code, but those lines are (should be) different now that you've written the third test. I see no problem with this.

Upvotes: 5

Related Questions