AndyMoore
AndyMoore

Reputation: 1444

unit testing in python - creating dataframes within tests

I am writing unit tests for methods - often manipulating DataFrames.

My data is coming from API calls and I have fallen into the trap of using the API calls within the tests - I feel like this does not accurately test the specific components as there could be a problem with the API call.

Would it be better practice to create a dummy dataframe in each test, and separately test that the API calls return DataFrames of the expected format?

It's a pain making DataFrames manually, is there a utility that can convert a DataFrame object in an active console into the code string required to build it?

Upvotes: 4

Views: 826

Answers (2)

piRSquared
piRSquared

Reputation: 294488

import pandas as pd, numpy as np

np.random.seed([3,1415])
df = pd.DataFrame(np.random.randint(10, size=(3, 5)), columns=list('ABCDE'))

df

   A  B  C  D  E
0  0  2  7  3  8
1  7  0  6  8  6
2  0  2  0  4  9

Then you can use to_json and read_json

pd.read_json(df.to_json())

   A  B  C  D  E
0  0  2  7  3  8
1  7  0  6  8  6
2  0  2  0  4  9

Upvotes: 2

Stefan Falk
Stefan Falk

Reputation: 25447

You can always save a DataFrame to a CSV file (and other formats like pickle):

df.to_csv('my_data.csv')

and of course re-load it:

pd.DataFrame.from_csv('my_data.csv')

Regarding your "test data":

The question is always what functionality you want to test. To me it sounds like as if you just want to test implementations of certain routines for a particular outcome etc. and not your API. Since you do not intend to test your API, just scrap it and don't use it if not necessary.

If at all, I'd write a script/program that fetches data from my API, store that as my "test data" and use that for my unit test. Unless you can just generate your test data on the fly (in a considerable amount of time) - then you can do that as well.

Upvotes: 2

Related Questions