unbalanced-bias
unbalanced-bias

Reputation: 11

How to mock boto3 calls when testing a function that calls boto3 in its body

I am trying to test a function called get_date_from_s3(bucket, table) using pytest. In this function, there a boto3.client("s3").list_objects_v2() call that I would like to mock during testing, but I can't seem to figure out how this would work.

Here is my directory setup:

my_project/
  glue/
      continuous.py
  tests/
      glue/
          test_continuous.py
          conftest.py
      conftest.py

The code continuous.py will be executed in an AWS glue job but I am testing it locally.

my_project/glue/continuous.py

import boto3

def get_date_from_s3(bucket, table):
    s3_client = boto3.client("s3")
    result = s3_client.list_objects_v2(Bucket=bucket, Prefix="Foo/{}/".format(table))

    # [the actual thing I want to test]
    latest_date = datetime_date(1, 1, 1)
    output = None
    for content in result.get("Contents"):
        date = key.split("/")
        output = [some logic to get the latest date from the file name in s3]

    return output

def main(argv):
    date = get_date_from_s3(argv[1], argv[2])

if __name__ == "__main__":
    main(sys.argv[1:])

my_project/tests/glue/test_continuous.py

This is what I want: I want to test get_date_from_s3() by mocking the s3_client.list_objects_v2() and explicitly setting the response value to example_response. I tried doing something like below but it doesn't work:

from glue import continuous
import mock

def test_get_date_from_s3(mocker):
    example_response = {
        "ResponseMetadata": "somethingsomething",
        "IsTruncated": False,
        "Contents": [
            {
                "Key": "/year=2021/month=01/day=03/some_file.parquet",
                "LastModified": "datetime.datetime(2021, 2, 5, 17, 5, 11, tzinfo=tzlocal())",
                ...
            },
            {
                "Key": "/year=2021/month=01/day=02/some_file.parquet",
                "LastModified": ...,
            },
            ...
        ]
    }
    
    mocker.patch(
        'continuous.boto3.client.list_objects_v2',
        return_value=example_response
    )

   expected = "20210102"
   actual = get_date_from_s3(bucket, table)
    
   assert actual == expected

Note

I noticed that a lot of examples of mocking have the functions to test as part of a class. Because continuous.py is a glue job, I didn't find the utility of creating a class, I just have functions and a main() that calls it, is it a bad practice? It seems like mock decorators before functions are used only for functions that are part of a class. I also read about moto, but couldn't seem to figure out how to apply it here.

Upvotes: 1

Views: 10596

Answers (3)

Gros Lalo
Gros Lalo

Reputation: 1088

The idea with mocking and patching that one would want to mock/patch something specific. So, to have correct patching, one has to specify exactly the thing to be mocked/patch. In the given example, the thing to be patched is located in: glue > continuous > boto3 > client instance > list_objects_v2.

As you pointed one you would like calls to list_objects_v2() to give back prepared data. So, this means that you have to first mock "glue.continuous.boto3.client" then using the latter mock "list_objects_v2".

In practice you need to do something along the lines of:

from glue import continuous_deduplicate
from unittest.mock import Mock, patch

@patch("glue.continuous.boto3.client")
def test_get_date_from_s3(mocked_client):
    mocked_response = Mock()
    mocked_response.return_value = { ... }
    mocked_client.list_objects_v2 = mocked_response

    # Run other setup and function under test:

Upvotes: 1

Bert Blommers
Bert Blommers

Reputation: 2123

In order to achieve this result using moto, you would have to create the data normally using the boto3-sdk. In other words: create a test case that succeeds agains AWS itself, and then slap the moto-decorator on it.

For your usecase, I imagine it looks something like:

from moto import mock_s3

@mock_s3
def test_glue:
    # create test data
    s3 = boto3.client("s3")
    for d in range(5):
        s3.put_object(Bucket="", Key=f"year=2021/month=01/day={d}/some_file.parquet", Body="asdf")
    # test
    result = get_date_from_s3(bucket, table)
    # assert result is as expected
    ...

Upvotes: 2

unbalanced-bias
unbalanced-bias

Reputation: 11

In the end, I figured out that my patching target value was wrong thanks to @Gros Lalo. It should have been 'glue.continuous.boto3.client.list_objects_v'. That still didn't work however, it threw me the error AttributeError: <function client at 0x7fad6f1b2af0> does not have the attribute 'list_objects_v'.

So I did a little refactoring to wrap the whole boto3.client in a function that is easier to mock. Here is my new my_project/glue/continuous.py file:

import boto3

def get_s3_objects(bucket, table):
    s3_client = boto3.client("s3")
    return s3_client.list_objects_v2(Bucket=bucket, Prefix="Foo/{}/".format(table))

def get_date_from_s3(bucket, table):
    result = get_s3_objects(bucket, table)

    # [the actual thing I want to test]
    latest_date = datetime_date(1, 1, 1)
    output = None
    for content in result.get("Contents"):
        date = key.split("/")
        output = [some logic to get the latest date from the file name in s3]

    return output

def main(argv):
    date = get_date_from_s3(argv[1], argv[2])

if __name__ == "__main__":
    main(sys.argv[1:])

My new test_get_latest_date_from_s3() is therefore:

def test_get_latest_date_from_s3(mocker):
    example_response = {
        "ResponseMetadata": "somethingsomething",
        "IsTruncated": False,
        "Contents": [
            {
                "Key": "/year=2021/month=01/day=03/some_file.parquet",
                "LastModified": "datetime.datetime(2021, 2, 5, 17, 5, 11, tzinfo=tzlocal())",
                ...
            },
            {
                "Key": "/year=2021/month=01/day=02/some_file.parquet",
                "LastModified": ...,
            },
            ...
        ]
    }
    mocker.patch('glue.continuous_deduplicate.get_s3_objects', return_value=example_response)

    expected_date = "20190823"
    actual_date = continuous_deduplicate.get_latest_date_from_s3("some_bucket", "some_table")

    assert expected_date == actual_date

The refactoring worked out for me, but if there is a way to mock the list_objects_v2() directly without having to wrap it in another function, I am still interested!

Upvotes: 0

Related Questions