Reputation: 4892
For the first time, I'm using Python to create a library, and I'm trying to take the opportunity in this project to learn unit testing. I've written a first method and I want to write some unit tests for it. (Yes, I know that TDD requires I write the test first, I'll get there, really.)
The method is fairly simple, but it expects that the class has a file
attribute set, that the attribute points to an existing file, and that the file is an archive of some sort (currently only working with zip files, tar, rar, etc., to be added later). The method is supposed to return the number of files in the archive.
I've created a folder in my project called files
that contains a few sample files, and I've manually tested the method and it works as it should so far. The manual test looks like this, located in the archive_file.py
file:
if __name__ == '__main__':
archive = ArchiveFile()
script_path = path.realpath(__file__)
parent_dir = path.abspath(path.join(script_path, os.pardir))
targ_dir = path.join(parent_dir, 'files')
targ_file = path.join(targ_dir, 'test.zip' )
archive.file = targ_file
print(archive.file_count())
All I do then is make sure that what's printed is what I expect given the contents of test.zip
.
Here's what file_count
looks like:
def file_count(self):
"""Return the number of files in the archive."""
if self.file == None:
return -1
with ZipFile(self.file) as zip:
members = zip.namelist()
# Remove folder members if there are any.
pruned = [item for item in members if not item.endswith('/')]
return len(pruned)
Directly translating this to a unit test seems wrong to me for a few reasons, some of which may be invalid. I'm counting on the precise location of the test files in relation to the current script file, I'll need a large sample of manually created archive files to make sure I'm testing enough variations, and, of course, I'm manually comparing the returned value to what I expect because I know how many files are in the test archive.
It seems to me that this should be automated as much as possible, but it also seems that doing so is going to be very complicated.
What's the proper way to create unit tests for such a class method?
Upvotes: 11
Views: 3058
Reputation: 841
Mocking works.
Another approach is setting up an environment.
For your use case, setting up an environment would mean creating a temporary directory, copy whatever kinds of files you expect to live there, and run your tests in it.
You do have to add a parameter or a global that tells your code in what directory to look.
This has been working quite well for me. However, my use case is somewhat different in that I am writing tests for code that uses an external program, so I have no way to mock anything.
Upvotes: 0
Reputation: 115
Suggest refactoring the pruning logic to a separate method that does not depend on file or ZipFile. This:
def file_count(self):
...
with ZipFile(self.file) as zip:
members = zip.namelist()
# Remove folder members if there are any.
pruned = [item for item in members if not item.endswith('/')]
return len(pruned)
Becomes:
def file_count(self):
...
with ZipFile(self.file) as zip:
return len(self.prune_dirs(zip.namelist()))
def prune_dirs(self, members):
# Remove folder members if there are any.
pruned = [item for item in members if not item.endswith('/')]
return pruned
Now, testing prune_dirs can be done without any test files.
members = ["a/", "a/b/", "a/b/c", "a/b/d"]
print archive.prune_dirs(members)
If you want to avoid integration testing, then you have to fake or mock ZipFile. In this case, any object that provides the method namelist() would do.
class FakeZipFile():
def __init__(self, filename):
self.filename = filename
def namelist(self):
return ['a', 'b/', 'c']
Now I introduce a new method get_helper() on ArchiveFile
class ArchiveFile():
def get_helper(self):
return ZipFile(self.filename)
def file_count(self):
...
helper = self.get_helper()
return len(self.prune_dirs(helper.namelist()))
...and override get_helper() in a Testing child class.
class ArchiveFileTesting(ArchiveFile):
def get_helper(self):
return FakeZipFile(self.file);
A testing class let's you override just what you need from ArchiveFile to eliminate the dependency on ZipFile. In your test, use the testing class and I think you have good coverage.
if __name__ == '__main__':
archive = ArchiveFileTesting()
You will probably want to think of ways to change namelist so you can test more cases than the one shown here.
Upvotes: 0
Reputation: 4892
Oasiscircle and dm03514 were very helpful on this and lead me to the right answer eventually, especially with dm's answer to a followup question.
What needs to be done is to use the mock
library to create a fake version of ZipFile
that won't object to there not actually being a file, but will instead return valid lists when using the nameslist
method.
@unittest.mock.patch('comicfile.ZipFile')
def test_page_count(self, mock_zip_file):
comic_file = ComicFile()
members = ['dir/', 'dir/file1', 'dir/file2']
mock_zip_file.return_value.__enter__.return_value.namelist.return_value \
= members
self.assertEqual(2, self.comic_file.page_count())
The __enter__.return_value
portion above is necessary because in the code being tested the ZipFile
instance is being created within a context manager.
Upvotes: 0
Reputation: 55962
There are so many different way to approach this. I like to think what would be valuable to test, off the top of my head, I can think of a couple things:
if self.file == None
)This testing could take place on two levels:
Unittest the logic
Unittesting the logic of your archive objects should be trivial. There looks to be a couple tests in your file_count method that could be valuable:
test_no_file_returns_negative_one
(error conditions are "hopefully" not very frequently executed code paths, and are great candidates for tests. Especially if your clients are expecting this -1
return value.
test_zip_file_pruned_logic
this looks to be very important functionality in your code, if implemented incorrectly it would completely throw off the count that your code is claiming to be able to return
test_happy_path_file_count_successful
I like to have a unittest that exercises the whole function, using mocked dependencies ZipFile
to make sure that everything is covered, without having to run the integration tests.
Test Integrations
I think a test for each supported archive type would be very valuable. These could be static fixtures that live in your repo, and your tests would already know how many files that each archive has and would assert on that.
I think all of your concerns are valid, and all of them can be addressed, and tested, in a maintainable way:
I'm counting on the precise location of the test files in relation to the current script file
This could be addressed by the convention of having your file fixtures stored in a subdirectory of your test package, and then using python to get the file path of your test package:
FIXTURE_DIR = os.path.join(os.path.dirname(__file__), 'fixtures')
For portable code it will be important to dynamically generate these paths.
I'll need a large sample of manually created archive files to make sure I'm testing enough variations
Yes, how many are good enough? AT LEAST a test per supported archive type. (netflix has to test against every single device that they have an app on :), plenty of companies have to run tests against large matrix of mobile devices) I think the test coverage here is crucial, but try to put all the edge cases that need to be covered in unittests.
I'm manually comparing the returned value to what I expect because I know how many files are in the test archive.
The archive will have to become static and your test will store that information.
One thing to keep in mind are the scopes of your tests. Making a test that exercise ZipFile
wouldn't be very valuable because its in the stdlibrary and already has tests. Additionally, testing that your code works with all python filesystems/os' probably wouldn't be very valuable either, as python already has those checks.
But scoping your tests to verify that your application works with all file types that it says it supports, I believe, is extremely valuable, because it is a contract between you and your clients, saying "hey this DOES work, let me show you". The same way that python's tests are a contract between it and you saying "Hey we support OSX/LINUX/whatever let me show you"
Upvotes: 2