Reputation: 6538
I am trying to use assert_frame_equal
for tests by comparing the function return dataframe to a dataframe read from a csv file. The csv file was created from the dataframe that this function returns:
results = my_fun()
results.to_csv("test.csv", mode="w", sep="\t", index=False)
Therefore, I assume they should be identical. Now, in the test I have the following code.
results = my_fun()
test_df = pd.read_csv("test.csv", sep="\t", header="infer", index_col=False, encoding="utf-8")
assert_frame_equal(results.reset_index(drop=True), test_df.reset_index(drop=True), check_column_type=False, check_dtype=False)
What I get is the following exception:
E AssertionError: DataFrame.iloc[:, 0] (column name="document_id") are different
E
E DataFrame.iloc[:, 0] (column name="document_id") values are different (100.0 %)
E [left]: [1, 1, 1, 2, 2, 2, 2, 2]
E [right]: [1, 1, 1, 2, 2, 2, 2, 2]
I am scratching my head. What is the actual difference here?
If I print results["document_id"]
and test_df["document_id"]
I get:
0 1
1 1
2 1
3 2
4 2
5 2
6 2
7 2
Name: document_id, dtype: object <class 'pandas.core.series.Series'>
0 1
1 1
2 1
3 2
4 2
5 2
6 2
7 2
Name: document_id, dtype: int64 <class 'pandas.core.series.Series'>
Upvotes: 3
Views: 2807
Reputation: 3001
What happens if you compare in a different way? E.g.,
results['document_id'] == test_df['document_id']
UPDATE: Question 2: what happens for:
results['document_id'].reset_index(drop=True) == \
test_df['document_id'].reset_index(drop=True)
# and for
results.index == test_df.index
Upvotes: 1