Reputation: 477
I am working on a python project, where we read parquet files from azure datalake and perform the required operations. We have defined a common parquet file reader function which we use to read such files. I am using MagicMock to mock objects and write test cases, when I ran the test cases some tests were failing because they have mocked values of some other test cases. I didn't understand this behavior completely. Below is my code
utils/common.py
def parquet_reader(parquet_path: str) -> Tuple[Dict, int]:
table = pq.read_table(parquet_directory_path)
return table.to_pydict(), table.num_rows
-------Test01-------------
PARQUET_DATA = {
"se_vl": ["530"],
"l_depart": ["028"],
"r_code": ["f5"],
"r_dir": ["W"],
}
NUM_RECORDS = 1
TEST_CAF_DATA = (PARQUET_DATA, NUM_RECORDS)
def test_read_from_source():
common.parquet_reader = MagicMock(return_value=TEST_CAF_DATA)
obj = next(read_from_source('path to file'))
assert obj.se_vl == "530"
-------Test02-------------
from utils import common
SAMPLE_DATA_PATH = "src/schedule/sample_data"
def test_parquet_reader():
rows, _ = common.parquet_reader(SAMPLE_DATA_PATH)
assert rows["key_id"][0] == 345689865
assert rows["com"][0] == "UP"
When I run all the tests(total of 240 test) than the variable 'rows' in test02 holds the data for test01 (PARQUET_DATA). However I fixed the above issue using patch. But still confused as why such behaviour using MagicMock?
Upvotes: 0
Views: 166