Monday, December 19, 2016

Auger - Automatic Unit Test Generation for Python

Unit tests are crucial to software development. They verify whether a given component actually implements what it promises. They are also important for long-living code, where future maintenance can be much more expensive without the proper level of unit test coverage. Not surprisingly, the lack of unit tests is a show-stopper for submitting production code at many companies.

You may wonder, if unit tests are so great, why do engineers hate to write them? More than once have I heard fellow engineers say "If you approve my code now, I will write the unit tests in a future change list." All engineers I know actually like writing code. It is creative. Code means impact. However, in general, engineers do not like writing unit tests at all. Why is that?

Why is Unit Testing Hard?

Unit testing is hard for various reasons:
  1. One reason would be that a unit testing framework is exactly that. A framework. When I worked on the IBM J9 and Eclipse team with Dave Thomas, he used to say "Everybody likes to write frameworks, but nobody likes to actually use them". In practice, unit test frameworks add yet another complexity to learn and master, especially when unit test frameworks have little in common across programming languages or organizations.
  2. All unit testing frameworks start off modestly. Erich Gamma once confided in me how he and Kent Beck wrote the original version of JUnit when they got bored on a transcontinental flight. He added how those few hours of work had been by far the best investment of his technical career ever. Today, however, unit testing frameworks are far from trivial and require a considerable learning investment in understanding their power and intricacies.
  3. Software systems themselves are also becoming increasingly complex. Even seemingly standalone components are heavily embedded in a context of complex runtimes. To isolate those dependencies and sculpt them with the proper "mocks" is an art. It is not uncommen to spend hours trying to find out how to write the mocks for one line of test code. Dependency injection and fancy mock syntax conspire to make the life of unit test authors challenging.
  4. Code under development constantly changes. Often, coding can be a discovery game. A design doc can outline architectural decisions on what database to use and how the UI is created. However, it tends to leave the actual coding as an exercise to the reader. During coding, discoveries are made. Code is constantly refactored. Code is found obsolete and is deleted even before it is ever committed to version control. This is particularly the case when a new domain or technology is being explored. We still write unit tests, but we write them later, when the dust settles.

OK. Unit tests are hard. But, what is the alternative?

Let's agree, unit test are a chore and tough to write. This does not mean engineers are not testing their code, even if they do not write unit tests. Many engineers, like myself, write code in a highly iterative fashion. For instance, say I am writing an AppEngine app to return a web page. I would first add a route, write a Handler, and simply return "Hello World". To test, I would spin up a local instance, point my browser at localhost:8080 and see if it shows the string I expected. I iterate that step hundreds of times.

In this iterative development mode, small, incremental steps are made towards the end goal and progress is constantly validated. Each time, some more functionality is added and tested on the code in progress. At some point, we are going to be "done". At his final point in time, hundreds or thousands of "test" runs have been exercised on the code under development. Each component has had some inputs and produced some expected outputs. If only we could remember what those inputs and outputs were and our tests would be so much easier to write. Cue: Project Auger.

What is Project Auger?

Project Auger (Automated Unittest Generator) watches your Python code while you write it and automatically generates all unit tests for your code, including all the mocks. Little or no work is required by the developer.

How does Auger Work?

Auger works like a smart Python debugger that sets breakpoints for each component you are interested in. Auger tracks two kinds of function calls related to the module under test:

  • For each function defined in the module, Auger records both the values of the arguments and the returned results. After recording enough execution traces, unit tests can be generated with the meaningful placeholder argument values and assertions. 
  • For each call made from a given component to dependent libraries or other components, we record the return value, so that this call can be automatically mocked out with known return values.

Auger tracks all possible functions, including instance methods, class methods, and static functions.

Consider the following example,, that provides a Pet with a name, age, and a species:

from sample.animal import Animal

class Pet(Animal):
    def __init__(self, name, *args):
        Animal.__init__(self, *args)
        self._name = name

    def get_name(self):
        return self._name

    def lower(s):
        return s.lower()

    def __str__(self):
        return '%s is a %s aged %d' % (
            Pet.lower(self.get_species()), self.get_age()

def create_pet(name, species, age=0):
    return Pet(name, species, age)

if __name__ == '__main__':
    print(Pet('Polly', 'Parrot'))
    print(create_pet('Clifford', 'Dog', 32))

This class has a few different entry points we would need to unit test:
  • The class Pet itself which has:
    • a static method, lower
    • two instance methods, get_name and __str__
    • a constructor, __init__, which is really a very special instance method
  • A static function that creates a Pet and returns it
The Pet class is a subclass of Animal, which we know nothing about, so we will need to mock that entire class. We do know that the class is used in the Pet constructor and inherited methods are from Pet called as well. This means that the implementation for self.get_species() and self.get_age() are unknown, as we cannot look at the implementation of Animal, when unit testing Pet. Therefore, those two inherited methods will be mocked out.

Unit Tests Generation with Auger

The above class definition combined with an execution run is enough for Auger to automatically create the following fully functional unit test:

from mock import patch
from sample.animal import Animal
from import Pet
import unittest

class PetTest(unittest.TestCase):
    @patch.object(Animal, 'get_species')
    @patch.object(Animal, 'get_age')
    def test___str__(self, mock_get_age, mock_get_species):
        mock_get_age.return_value = 12
        mock_get_species.return_value = 'Dog'
        pet_instance = Pet('Clifford', 'Dog', 12)
        self.assertEquals(pet_instance.__str__(), 'Clifford is a dog aged 12')

    def test_create_pet(self):
        self.assertIsInstance(,species='Dog',name='Clifford'), Pet)

    def test_get_name(self):
        pet_instance = Pet('Clifford', 'Dog', 12)
        self.assertEquals(pet_instance.get_name(), 'Clifford')

    def test_lower(self):
        self.assertEquals(Pet.lower(s='Dog'), 'dog')

if __name__ == "__main__":

No changes to the original code are needed to teach Auger anything. All that is required is for the developer to write their original code and exercise it somehow. In the above case, we simply ran python to produce two scenarios in which Pet instances were created and manipulated. From those two sample, a single test was extracted.

Of course Auger is limited in the sense that it cannot guess what scenario is being tested. Rather than generates multiple, focused tests per module, it will generate one big test that covers the entire module. The value of Auger is more in generating all the boiler plate code, imports and mocks, ensuring proper coverage, and to generate a template for manual refinement.

To generate a set of unit tests, Auger magic is invoked:

import auger
if __name__ == "__main__":
    with auger.magic(pet):

In this case, one module is passed, pet, but multiple modules can be passed as well. Each one will be traced and unit tested.

When a unit test is produced, it is written out to the local file system under the corresponding tests folder.

IDE Integration

Auger does not have direct IDE integration per se, but works really well with PyCharm. This integration comes for free, because the IDE watches the underlying file system and will automatically discover when new files are created by Auger in the local file system. These tests can then be executed easily as well:

Adding the generated tests to Git and commit/push them to a repository means just a few clicks from that point onwards in an IDE such as PyCharm.

Future Work

  • Incremental test generation. Collect multiple execution runs, persist the invocations, and merge multiple runs into one test case.
  • Preserve manual edits performed by users on generated test cases when a test is regenerated.
  • Support other unit test frameworks, such as pytest, nose, cucumber, etc.
  • Figure out how to run Auger on itself. This is non-trivial :-)

Check out Project Auger and let me know what you think of it. Contributions are welcomed.

Python code styled as tango by with border:1px solid #ddd;padding:11px 7px;

No comments:

Post a Comment