The number one problem that I see developers have when practicing test-first development that impedes them from refactoring their code is that they over-specify behavior in their tests. This leads developers to write more tests than are needed, which can become a burden when refactoring code.
I am a big advocate of having a complete test base and even erring on the side of caution when it comes to quality engineering and software validation but that is not what we’re talking about here. What we’re talking about here are the tests that we write when we’re doing test-first development and I’m proposing that writing those tests from the perspective of specifying the behaviors that we want to create is a highly valuable way of writing tests because it drives us to think at the right level of abstraction for creating behavioral tests and that allow us the freedom to refactor our code without breaking it.
So the question becomes how many tests are enough?
Have you ever played the game 20 questions? Most of us have played that game at one point in our lives. One person thinks of something that could be an animal, vegetable, or mineral and then they answer yes/no questions that are asked of them. The point of the game is to ask as few questions as possible in order to accurately guess what the person is thinking.
This is how I think of the unit tests that I write the specified behavior as I’m doing test-first development. I ask what are the fewest tests that I need to write in order to assert the behavior I want to create. Notice that how doing this as part of the test-first methodology makes a lot of sense because we’re essentially asking what assertions to create in order for us to be driven to build the behavior to make those assertions pass.
For the type of behavioral testing that we’re talking about when doing test-first development, our goal is to make each test unique and so we’re only testing the main scenarios. We’re writing tests for the scenarios that drive us to write the code we want to create but again that isn’t necessarily all the tests that we need. We want to add additional tests as part of our quality engineering effort and not as part of the effort of doing test-first development.
Here’s the challenge. When I’m given only one example of a process then it can be difficult to generalize and write a good general algorithm for the process. In these situations, I’ll sometimes come up with the second scenario that I can use to refactor my first implementation into a more generalized implementation.
But when I do this I and up with a second test which is redundant to the first test. Some people believe that that second test is still valuable because, they reason, that since you needed it to create the production code then someone else will probably need it to understand the system in the future if they needed to recreate the code from the tests, and that makes sense. But for me that additional test is redundant and so I typically delete it or move it into a different namespace where I keep my other quality assurance tests.
This has led me to follow a practice that I learned from Amir Kolsky, which involves separating out what he calls red tests from green tests. Green tests are kinds of tests that we write when we’re doing test-first development. We use these tests to drive us to create the behavior we want in the system. Red tests, on the other hand, are tests that we write after we write our code to validate that it is working as expected.
Why separate out red tests from green tests? Because my green tests serve a fundamentally different purpose. They are there to act as a living specification, validating that the behaviors work as expected. Regardless of whether they are implemented in a unit testing framework or an acceptance testing framework, they are in essence acceptance tests because they’re based upon validating behaviors or acceptance criteria rather than implementation details. I call these developer tests because essentially these are the kind of tests that we write in the course of doing development. Conversely, red tests are tests I write after the code is written to lock down some implementation. Red tests are more QA tests.
When I refactor my code, I expect that none of my green tests will break. If red tests break then that’s okay because remember, my red tests can be implementation dependent and when I change an implementation it may cause some red tests to break. But it shouldn’t break any green tests. I find that this is a valuable distinction.
Note: This blog post is based on one of the “Seven Strategies…” sections in my book, Beyond Legacy Code: Nine Practices to Extend the Life (and Value) of Your Software.
Previous Post: « Use Mocks to Test Workflows
Next Post: Use Accurate Examples »