Reputation: 1444
I am writing unit tests for methods - often manipulating DataFrames.
My data is coming from API calls and I have fallen into the trap of using the API calls within the tests - I feel like this does not accurately test the specific components as there could be a problem with the API call.
Would it be better practice to create a dummy dataframe in each test, and separately test that the API calls return DataFrames of the expected format?
It's a pain making DataFrames manually, is there a utility that can convert a DataFrame object in an active console into the code string required to build it?
Upvotes: 4
Views: 826
Reputation: 294488
import pandas as pd, numpy as np
np.random.seed([3,1415])
df = pd.DataFrame(np.random.randint(10, size=(3, 5)), columns=list('ABCDE'))
df
A B C D E
0 0 2 7 3 8
1 7 0 6 8 6
2 0 2 0 4 9
Then you can use to_json
and read_json
pd.read_json(df.to_json())
A B C D E
0 0 2 7 3 8
1 7 0 6 8 6
2 0 2 0 4 9
Upvotes: 2
Reputation: 25447
You can always save a DataFrame
to a CSV file (and other formats like pickle
):
df.to_csv('my_data.csv')
and of course re-load it:
pd.DataFrame.from_csv('my_data.csv')
Regarding your "test data":
The question is always what functionality you want to test. To me it sounds like as if you just want to test implementations of certain routines for a particular outcome etc. and not your API. Since you do not intend to test your API, just scrap it and don't use it if not necessary.
If at all, I'd write a script/program that fetches data from my API, store that as my "test data" and use that for my unit test. Unless you can just generate your test data on the fly (in a considerable amount of time) - then you can do that as well.
Upvotes: 2