Chuck
Chuck

Reputation: 4892

How do I test a method that requires a file's presence?

For the first time, I'm using Python to create a library, and I'm trying to take the opportunity in this project to learn unit testing. I've written a first method and I want to write some unit tests for it. (Yes, I know that TDD requires I write the test first, I'll get there, really.)

The method is fairly simple, but it expects that the class has a file attribute set, that the attribute points to an existing file, and that the file is an archive of some sort (currently only working with zip files, tar, rar, etc., to be added later). The method is supposed to return the number of files in the archive.

I've created a folder in my project called files that contains a few sample files, and I've manually tested the method and it works as it should so far. The manual test looks like this, located in the archive_file.py file:

if __name__ == '__main__':
    archive = ArchiveFile()

    script_path = path.realpath(__file__)
    parent_dir = path.abspath(path.join(script_path, os.pardir))
    targ_dir = path.join(parent_dir, 'files')
    targ_file = path.join(targ_dir, 'test.zip' )

    archive.file = targ_file

    print(archive.file_count())

All I do then is make sure that what's printed is what I expect given the contents of test.zip.

Here's what file_count looks like:

def file_count(self):
    """Return the number of files in the archive."""
    if self.file == None:
        return -1

    with ZipFile(self.file) as zip:
        members = zip.namelist()
        # Remove folder members if there are any.
        pruned = [item for item in members if not item.endswith('/')]
        return len(pruned)

Directly translating this to a unit test seems wrong to me for a few reasons, some of which may be invalid. I'm counting on the precise location of the test files in relation to the current script file, I'll need a large sample of manually created archive files to make sure I'm testing enough variations, and, of course, I'm manually comparing the returned value to what I expect because I know how many files are in the test archive.

It seems to me that this should be automated as much as possible, but it also seems that doing so is going to be very complicated.

What's the proper way to create unit tests for such a class method?

Upvotes: 11

Views: 3058

Answers (4)

toolforger
toolforger

Reputation: 841

Mocking works.
Another approach is setting up an environment.

For your use case, setting up an environment would mean creating a temporary directory, copy whatever kinds of files you expect to live there, and run your tests in it.

You do have to add a parameter or a global that tells your code in what directory to look.

This has been working quite well for me. However, my use case is somewhat different in that I am writing tests for code that uses an external program, so I have no way to mock anything.

Upvotes: 0

Jeremy Thien
Jeremy Thien

Reputation: 115

Suggest refactoring the pruning logic to a separate method that does not depend on file or ZipFile. This:

def file_count(self):
    ...
    with ZipFile(self.file) as zip:
        members = zip.namelist()
        # Remove folder members if there are any.
        pruned = [item for item in members if not item.endswith('/')]
        return len(pruned)

Becomes:

def file_count(self):
    ...
    with ZipFile(self.file) as zip:
        return len(self.prune_dirs(zip.namelist()))

def prune_dirs(self, members):
    # Remove folder members if there are any.
    pruned = [item for item in members if not item.endswith('/')]
    return pruned

Now, testing prune_dirs can be done without any test files.

members = ["a/", "a/b/", "a/b/c", "a/b/d"]
print archive.prune_dirs(members)

If you want to avoid integration testing, then you have to fake or mock ZipFile. In this case, any object that provides the method namelist() would do.

class FakeZipFile():

    def __init__(self, filename):
        self.filename = filename

    def namelist(self):
        return ['a', 'b/', 'c']

Now I introduce a new method get_helper() on ArchiveFile

class ArchiveFile():

    def get_helper(self):
        return ZipFile(self.filename)

    def file_count(self):
        ...
        helper = self.get_helper()
        return len(self.prune_dirs(helper.namelist()))

...and override get_helper() in a Testing child class.

class ArchiveFileTesting(ArchiveFile):

    def get_helper(self):
        return FakeZipFile(self.file);

A testing class let's you override just what you need from ArchiveFile to eliminate the dependency on ZipFile. In your test, use the testing class and I think you have good coverage.

if __name__ == '__main__':
    archive = ArchiveFileTesting()

You will probably want to think of ways to change namelist so you can test more cases than the one shown here.

Upvotes: 0

Chuck
Chuck

Reputation: 4892

Oasiscircle and dm03514 were very helpful on this and lead me to the right answer eventually, especially with dm's answer to a followup question.

What needs to be done is to use the mock library to create a fake version of ZipFile that won't object to there not actually being a file, but will instead return valid lists when using the nameslist method.

@unittest.mock.patch('comicfile.ZipFile')
def test_page_count(self, mock_zip_file):
    comic_file = ComicFile()
    members = ['dir/', 'dir/file1', 'dir/file2']
    mock_zip_file.return_value.__enter__.return_value.namelist.return_value \
                = members
    self.assertEqual(2, self.comic_file.page_count())

The __enter__.return_value portion above is necessary because in the code being tested the ZipFile instance is being created within a context manager.

Upvotes: 0

dm03514
dm03514

Reputation: 55962

There are so many different way to approach this. I like to think what would be valuable to test, off the top of my head, I can think of a couple things:

  • validation logic (if self.file == None)
  • pruning logic
  • that all file types claimed to be supported are actually supported

This testing could take place on two levels:

  1. Unittest your logic
  2. Test integration (ie supported archive types against the filesystem)

Unittest the logic

Unittesting the logic of your archive objects should be trivial. There looks to be a couple tests in your file_count method that could be valuable:

  • test_no_file_returns_negative_one (error conditions are "hopefully" not very frequently executed code paths, and are great candidates for tests. Especially if your clients are expecting this -1 return value.

  • test_zip_file_pruned_logic this looks to be very important functionality in your code, if implemented incorrectly it would completely throw off the count that your code is claiming to be able to return

  • test_happy_path_file_count_successful I like to have a unittest that exercises the whole function, using mocked dependencies ZipFile to make sure that everything is covered, without having to run the integration tests.

Test Integrations

I think a test for each supported archive type would be very valuable. These could be static fixtures that live in your repo, and your tests would already know how many files that each archive has and would assert on that.

I think all of your concerns are valid, and all of them can be addressed, and tested, in a maintainable way:

I'm counting on the precise location of the test files in relation to the current script file

This could be addressed by the convention of having your file fixtures stored in a subdirectory of your test package, and then using python to get the file path of your test package:

FIXTURE_DIR = os.path.join(os.path.dirname(__file__), 'fixtures')

For portable code it will be important to dynamically generate these paths.

I'll need a large sample of manually created archive files to make sure I'm testing enough variations

Yes, how many are good enough? AT LEAST a test per supported archive type. (netflix has to test against every single device that they have an app on :), plenty of companies have to run tests against large matrix of mobile devices) I think the test coverage here is crucial, but try to put all the edge cases that need to be covered in unittests.

I'm manually comparing the returned value to what I expect because I know how many files are in the test archive.

The archive will have to become static and your test will store that information.


One thing to keep in mind are the scopes of your tests. Making a test that exercise ZipFile wouldn't be very valuable because its in the stdlibrary and already has tests. Additionally, testing that your code works with all python filesystems/os' probably wouldn't be very valuable either, as python already has those checks.

But scoping your tests to verify that your application works with all file types that it says it supports, I believe, is extremely valuable, because it is a contract between you and your clients, saying "hey this DOES work, let me show you". The same way that python's tests are a contract between it and you saying "Hey we support OSX/LINUX/whatever let me show you"

Upvotes: 2

Related Questions