KenHBS
KenHBS

Reputation: 7174

Comparing two pandas dataframes with different integer types

I just ran into some weird behaviour comparing the values of two pandas dataframes using pd.Dataframe.equals():

Comparison 1

df1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
df2 = df1.copy()

df1.equals(df2)
# True (obviously)

However, when I change the column type to a different integer format, they will not be considered equal anymore:

df1['a'] = df1['a'].astype(np.int32)
df1.equals(df2)
# False

In the .equals() documentation, they point out that the variables must have the same type, and present an example comparing floats to integers, which doesn't work. I didn't expect this to extend to different types of integers, too.

Comparison 2

When doing the same comparison using ==, it does return True:

(df1 == df2).all().all()   
# True

However, == doesn't assess two missing values as equal to each other.

My question

Is there an elegant way to handle missing values as equal, whilst not enforcing the same integer type? The best I can come up with is:

(df1.fillna(0) == df2.fillna(0)).all().all()

but there has to be a more concise and less hacky way to deal with this problem.

My follow up, opinion-based question: Would you consider this a bug?

Upvotes: 8

Views: 2214

Answers (1)

Sid Kwakkel
Sid Kwakkel

Reputation: 799

If you think of this as a decimal problem (i.e. does 2 equal 2) then this perhaps looks like a bug. However, if you look at it from how the interpreter sees it (i.e. does 00000010 equal 0000000000000010) then it becomes plain that there is indeed a difference. Bitwise operations.

From a validation perspective, it is probably a good idea to make sure you are comparing apples to apples and so I like the answer of @Ben.T:

df1.equals(df2.astype(df1.dtypes))

Is this a bug? That is above my pay grade. You can submit it, and the thinkers surrounding the pandas library can make a decision. It does seem odd that the '==' operator gives different results that the '.equals' function and that may sway the decision.

Upvotes: 3

Related Questions