Pelle Martin Hesketh
Pelle Martin Hesketh

Reputation: 173

Test not working whilst testing pandas dataframes

Hey I am building tests for a small data cleaning script and am having a lot of problems with how to test my dataframes. I have tried using pandas.testing.assert_frame_equl and am currently trying to use df.equals(other_df). However, it does not work, even though I get the result I am looking for if I print the data.

Test code:

def test_spliting_the_name_of_stations_and_the_parameters():
        columns = ["PNT_NAME", "TIME", "VALUE", "STATE", "BASE"]
        test_data = [
            {"PNT_NAME": "B710_OVL_M3MIN", "TIME": "2019.01.01 07:00:04", "VALUE": 0.00, "STATE": 0, "BASE": 0},
            {"PNT_NAME": "B710_OVL_M3MIN", "TIME": "2019.01.01 07:01:03", "VALUE": 2.00, "STATE": 0, "BASE": 0}]
        df_test = pandas.DataFrame(data=test_data, columns=columns)
        cleaning = DataCleaning(df_test)
        cleaning.split_station_and_parameter()
        expected_columns = ["PNT_NAME", "PARAMETER_NAME", "TIME", "VALUE", "STATE", "BASE"]
        expected_data = [
            {"PNT_NAME": "B710", "PARAMETER_NAME": "OVL_M3MIN", "TIME": "2019.01.01 07:00:04", "VALUE": 0.00, "STATE": 0, "BASE": 0},
            {"PNT_NAME": "B710", "PARAMETER_NAME": "OVL_M3MIN", "TIME": "2019.01.01 07:01:03", "VALUE": 2.00, "STATE": 0, "BASE": 0}]
        expected_df = pandas.DataFrame(data=expected_data, columns=expected_columns)
        assert cleaning.df.equals(expected_df)

Method thats called:

class DataCleaning:

    def __init__(self, df) -> None:
        self.df = df

    def split_station_and_parameter(self):
        self.df[['PNT_NAME', 'PARAMETER_NAME']] = self.df['PNT_NAME'].str.split('_', 1, expand=True)

What am I doing wrong, why des the assertion not work? Thanks

Upvotes: 1

Views: 158

Answers (1)

Laurent
Laurent

Reputation: 13518

assert cleaning.df.equals(expected_df) raises AssertionError because your dataframes' columns order is not identical:

  • "PARAMETER_NAME" is the last column of cleaning.df
  • but the second one of expected_df

If you add the following line (to reorder expected_df) just before the assert expression:

expected_df = expected_df[["PNT_NAME", "TIME", "VALUE", "STATE", "BASE", "PARAMETER_NAME"]]

Then no exception is raised anymore.

Upvotes: 1

Related Questions