Wednesday, 29 August 2012

Mutation Testing


Does "mutation testing" ring any bells? I am sure it reminds you of X Men series, where we heard the word mutants a lot number of times.  So who were they? They were actually the ones whose genes were modified or have some special change from the rest. Likewise in Mutation Testing, we make the code a mutant and follow the changes in Test Suite behaviour.

What is Mutation Testing?

It is assumed that more cases a test suite contains, the higher the probability that the program will work correctly in the real world. Mutation Testing was introduced as a way of measuring the accuracy of test suites. In general, there is no easy way to tell if the test suite thoroughly tests the program or not. If the program passes the test suite, one may only say that program works correctly on all the cases that are included in the test suite. This checks only the correctness of Test suite. However, there is no mathematical way to measure how accurate the test suite is and the probability that the program will work correctly.

Concept of Killed and Equivalent mutants

The idea of mutation testing was introduced to solve the problem of measuring the accuracy of test suites. In mutation testing, one is in some sense trying to solve this problem by inverting the scenario.

The thinking goes as follows: Let’s assume that we have a perfect test suite, one that covers all possible cases. Let’s also assume that we have a perfect program that passes this test suite. If we change the code of the program (this process is called mutating) and we run the mutated program (mutant) against the test suite, we will have two possible scenarios:

  • The results of the program were affected by the code change and the test suite detects it. We assumed that the test suite is perfect, which means that it must detect the change. If this happens, the mutant is called a killed mutant.

  • The results of the program are not changed and the test suite does not detect the mutation. The mutant is called an equivalent mutant.

So the quality of the Test Suite is judged by this as follows:

Quality of the test suite/software = # of Killed Mutants / # of Mutants generated.

If Q<1; then it should be a warning sign to how sensitive the program is to the code changes. 

In the normal world, we do not have the perfect program and we do not have the perfect test suite. Thus, we can have one more scenario:

  • The results of the program are different, but the test suite does not detect it because it does not have the right test case.

If we again calculate the same ratio as above and we get a number smaller than 1 then that should also indicate the accuracy of the test suite.

In practice, there is no way to separate the effect that is related to test suite inaccuracy and that which is related to equivalent mutants. In the absence of other possibilities, one can accept the ratio of killed mutants to all the mutants as the measure of the test suite accuracy.


This C code example illustrates the ideas described above.

Could you detect the serious hidden errors in this test suite?

This test suite is quite representative of the test suites in the industry. It tests positive test cases, which means it tests if the program reports correct values for the correct inputs. It completely neglects illegal inputs to the program. Test Case 1 fully passes the test suite; however, it has serious hidden errors.

Now, let’s mutate the program. We can start with the following simple changes:

If we run this modified program against the test suite, we will get the following results:

Mutant 1 and 3 - program will completely pass the test suite
Mutant 2 - program will fail all test cases.

Mutants 1 and 3 do not change the output of the program, and are thus equivalent mutants.
The test suite does not detect them.

Mutant 2, however, is not an equivalent mutant. Test cases 1-4 will detect it through wrong output from the program. Test case 5 may have different behaviour on different machines. It may show up as bad output from the program, but at the same time, it may be visible as a program crash.

If we calculate the statistics, we see that we created three mutants and only one was killed.

Thus, the quality of the test suite = 1/3. As we can see, the number 1/3 is low. It is low because we generated two equivalent mutants. This number should serve as a warning that we are not testing enough. In fact, the program has two serious errors that should be detected by the test suite.


Kinds of Mutation

  • Value Mutation - these mutations involve changing the values of constants or parameters (by adding or subtracting values etc), e.g. loop bounds { being one out on the start or _nish is a very common error.
  • Decision Mutation - this involves modifying conditions to react potential slips and errors in the coding of conditions in programs, e.g. a typical mutation might be replacing a > by a < in a comparison.
  • Statement Mutations - these might involve deleting certain lines to reflect omissions in coding or swapping the order of lines of code. There are other operations, e.g. changing operations in arithmetic expressions. A typical omission might be to omit the increment on some variable in a while loop.

Benefits of Mutation testing

  • It provides the tester with a target. The tester has to develop a test data that is capable of killing all the generated mutants. Hence, we can generate an effective test data set that is powerful enough to find errors in the program.
  • Another advantage of mutation testing is that even if no error is found, it still gives the user information about the quality of the program tested.
  • Mutation testing makes a program less buggy and more reliable and increases the confidence in the working of the product – which is the bottom line of any software testing activity.