Mock a call function to an external module

Question

I have a class that does some validates on a pandas dataframe I read. The class looks something like this (simplified some stuff might make no sense)

import pandas as pd

class PandasValidator:
    read_kwargs = {'sep'='	',header=None}
    def __init__(self,path_to_data:str,max_rows:int) -> None:
        self.path = path

def validate_num_rows(self,threshold: float = 0.1) -> bool:
    df_shape = pd.read_csv(self.path,*self.read_kwargs).shape
    return df_shape[0]*threshold <= self.max_rows

I want to test the method validate_num_rows, so I would like to patch the first line of the function, I don't to read an actual df when testing it, my test would look something like this (this is not working code, my best attempt).

@patch('df.read_csv') #not sure what goes in here
def test_validate_num_rows(mock) -> None:
    mock.shape=(30,30)
    result = PandasValidator('dummy-path',30).validate_num_rows(0.1)
    assert result == True

To be honest I have no idea what to patch and mock or how to do it. I want to mock the first line of the validate_num_rows method. I know refactoring the code would make testing easier but that's not a choice I have

decorator-factory · Accepted Answer

Your class would be easier to test if it accepted a dataframe instead of reading it itself.

import pandas as pd

class PandasValidator:
    def __init__(self, df: pd.DataFrame, max_rows: int) -> None:
        self._df = df
        self._max_rows = max_rows

    def validate_num_rows(self, threshold: float = 0.1) -> bool:
        return self._df.shape * threshold <= self._max_rows

Now in your test, you just need to construct a dataframe in memory and pass it to PandasValidator.

Then you can make another function that reads a dataframe from a file:

import pandas as pd
from pathlib import Path

def read_csv(path: Path) -> pd.DataFrame:
    return pd.read_csv(path, sep='	', header=None)

If you want to test this function, you can use the monkeypatch fixture from pytest:

import pandas
import your_module


def test_read_csv(monkeypatch):
    expected_dataframe = # create a dataframe somehow

    def fake_read_csv(path, **kwargs):
        assert path == Path('/foo/bar')
        assert kwargs == {'sep': '	', 'header': None}
        return expected_dataframe

    monkeypatch.setattr(pandas, "read_csv", fake_read_csv)

    actual_dataframe = your_module.read_csv(Path('/foo/bar'))
    assert actual_dataframe == expected_dataframe

Another approach is to not use monkeypatching at all, but to test the function against the real file system using temporary files:

from pathlib import Path
import your_module

def test_read_csv(tmp_dir: Path):
    csv_path = tmp_dir / "my_fake_csv.csv"
    csv_path.write_text(some_predefined_csv)
    dataframe = your_module.read_csv(csv_path)
    # TODO: assert some properties about your dataframe

Some people would characterize it as an integration test, not a unit test, because it talks to the real file system. Perhaps it's more suitable for this function because it doesn't do much except talk to the file system. Also see "Stop Using Mocks (for a while)" by Harry Percival

Mock a call function to an external module

Answers (1)

Related Questions