pandas assert_frame_equal fails to compare two identical dataframes

Question

I am trying to use assert_frame_equal for tests by comparing the function return dataframe to a dataframe read from a csv file. The csv file was created from the dataframe that this function returns:

results = my_fun()
results.to_csv("test.csv", mode="w", sep="	", index=False)

Therefore, I assume they should be identical. Now, in the test I have the following code.

results = my_fun()
test_df = pd.read_csv("test.csv", sep="	", header="infer", index_col=False, encoding="utf-8")
assert_frame_equal(results.reset_index(drop=True), test_df.reset_index(drop=True), check_column_type=False, check_dtype=False)

What I get is the following exception:

E   AssertionError: DataFrame.iloc[:, 0] (column name="document_id") are different
E
E   DataFrame.iloc[:, 0] (column name="document_id") values are different (100.0 %)
E   [left]:  [1, 1, 1, 2, 2, 2, 2, 2]
E   [right]: [1, 1, 1, 2, 2, 2, 2, 2]

I am scratching my head. What is the actual difference here? If I print results["document_id"] and test_df["document_id"] I get:

0    1
1    1
2    1
3    2
4    2
5    2
6    2
7    2
Name: document_id, dtype: object 
0    1
1    1
2    1
3    2
4    2
5    2
6    2
7    2
Name: document_id, dtype: int64

pandas assert_frame_equal fails to compare two identical dataframes

Answers (1)

Related Questions