Dinero
Dinero

Reputation: 1160

Using Pytest to test if a file is valid

I am brand new to python and know very little about testing.

Background

I am brand new to writing code in python and know little about testing. So as a first step I went through PyTest document to get an idea of how to write tests. I have a basic idea of what fixtures are and how to use monkey patching etc. However I am a little stumped because I am unsure how to test my code as the given examples on the website not as practical.

My Code

class Cache:
    def __init__(self):
        pass

    def get_object_etag(self, s3_path: str, file_name: str) -> str:
        bucket, key = s3.deconstruct_s3_path(f"{s3_path}/{file_name}")
        return s3_resource().Object(bucket, key).e_tag

    def file_exists(self, local_path: str, file_name: str) -> bool:
        return os.path.exists(f"{local_path}/{file_name}")

    def cache_file(self, s3_path: str, local_path: str, file_name_on_s3: str) -> None:
        etag_value = self.get_object_etag(s3_path, file_name_on_s3)
        local_file_name = "etag_" + etag_value + "_" + file_name_on_s3
        if not self.file_exists(local_path, local_file_name):
            os.makedirs(local_path, exist_ok=True)
            s3.copy_with_python_retry(
                from_path=f"{s3_path}/{file_name_on_s3}",
                to_path=f"{local_path}/{local_file_name}",
            )
        else:
            print("Cached File is Valid")

I want to test the cache_file() function. This function will take the path of a file on s3, local path, the file name on s3 and append the etag value as a name. At any given time we can check if a path / file exists. In case the etag has changed then the file won't exist either since the local_file_name we construct would be invalid.

Testing Approach

Let's assume currently I have a path foo/myfoo/etag_123_my_file.csv/

Now let's say i go to s3 and for some reason the etag has changed so my file name becomes etag_124_my_file.csv so in this case i would fail the file exists check and be forced to download the updated file again.

Another test case would be the ideal case where the file name on s3 matches the local file name meaning the cached file is valid.

I am very confused on how I can test this as I am new to testing an pytest and don't have the test driven mindset.

For instance do i use monkey patch and just set attributes like the local_file_name and e_tag_value ? I would love it if someone can provide an example. It would help me a lot to get started in the testing world.

Upvotes: 0

Views: 1671

Answers (1)

MrBean Bremen
MrBean Bremen

Reputation: 16815

Provided you mock all the S3 calls (with or without some framework), you could for example mock file_exists to return False, and check that s3.copy_with_python_retry is called, and in another test check the opposite (not called if file_exists returns True).

Here is a crude example (not using any framework) to show what I mean:

from unittest import mock

import pytest

from s3cache import Cache  # assuming Cache is in s3cache.py


@pytest.fixture
def s3_mock():
    with mock.patch('s3cache.s3') as s3_mock:  # assuming you use "from boto3 import s3"
        yield s3_mock


@pytest.fixture(autouse=True)
def get_object_mock():
    # this is just for convenience - you mock the function as you don't test it
    # autouse=True means it is mocked in all tests
    with mock.patch('s3cache.Cache.get_object_etag') as object_mock:
        yield object_mock


@mock.patch('os.makedirs')
@mock.patch('s3cache.Cache.file_exists', return_value=False)
def test_s3cache_exists(exists_mock, makedirs_mock, s3_mock):
    cache = Cache()
    # you can put more sensible value here
    cache.cache_file("my_path", "local_path", "s3_fname")
    makedirs_mock.assert_called_once_with('local_path', exist_ok=True)
    # you can also check the arguments as above if needed
    s3_mock.copy_with_python_retry.assert_called_once()


@mock.patch('os.makedirs')
@mock.patch('s3cache.Cache.file_exists', return_value=True)
def test_s3cache_not_existing(exists_mock, makedirs_mock, s3_mock):
    cache = Cache()
    cache.cache_file("my_path", "local_path", "s3_fname")
    makedirs_mock.assert_not_called()
    s3_mock.copy_with_python_retry.assert_not_called()

As you don't want to test S3 functionality itself, you mostly have to check that it is called correctly, though you have to decide yourself what exactly you want to test.

Upvotes: 1

Related Questions