Reputation: 173
Hey I am building tests for a small data cleaning script and am having a lot of problems with how to test my dataframes. I have tried using pandas.testing.assert_frame_equl and am currently trying to use df.equals(other_df). However, it does not work, even though I get the result I am looking for if I print the data.
Test code:
def test_spliting_the_name_of_stations_and_the_parameters():
columns = ["PNT_NAME", "TIME", "VALUE", "STATE", "BASE"]
test_data = [
{"PNT_NAME": "B710_OVL_M3MIN", "TIME": "2019.01.01 07:00:04", "VALUE": 0.00, "STATE": 0, "BASE": 0},
{"PNT_NAME": "B710_OVL_M3MIN", "TIME": "2019.01.01 07:01:03", "VALUE": 2.00, "STATE": 0, "BASE": 0}]
df_test = pandas.DataFrame(data=test_data, columns=columns)
cleaning = DataCleaning(df_test)
cleaning.split_station_and_parameter()
expected_columns = ["PNT_NAME", "PARAMETER_NAME", "TIME", "VALUE", "STATE", "BASE"]
expected_data = [
{"PNT_NAME": "B710", "PARAMETER_NAME": "OVL_M3MIN", "TIME": "2019.01.01 07:00:04", "VALUE": 0.00, "STATE": 0, "BASE": 0},
{"PNT_NAME": "B710", "PARAMETER_NAME": "OVL_M3MIN", "TIME": "2019.01.01 07:01:03", "VALUE": 2.00, "STATE": 0, "BASE": 0}]
expected_df = pandas.DataFrame(data=expected_data, columns=expected_columns)
assert cleaning.df.equals(expected_df)
Method thats called:
class DataCleaning:
def __init__(self, df) -> None:
self.df = df
def split_station_and_parameter(self):
self.df[['PNT_NAME', 'PARAMETER_NAME']] = self.df['PNT_NAME'].str.split('_', 1, expand=True)
What am I doing wrong, why des the assertion not work? Thanks
Upvotes: 1
Views: 158
Reputation: 13518
assert cleaning.df.equals(expected_df)
raises AssertionError because your dataframes' columns order is not identical:
If you add the following line (to reorder expected_df) just before the assert expression:
expected_df = expected_df[["PNT_NAME", "TIME", "VALUE", "STATE", "BASE", "PARAMETER_NAME"]]
Then no exception is raised anymore.
Upvotes: 1