Comparing two pandas dataframes with different integer types

Question

I just ran into some weird behaviour comparing the values of two pandas dataframes using pd.Dataframe.equals():

Comparison 1

df1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
df2 = df1.copy()

df1.equals(df2)
# True (obviously)

However, when I change the column type to a different integer format, they will not be considered equal anymore:

df1['a'] = df1['a'].astype(np.int32)
df1.equals(df2)
# False

In the .equals() documentation, they point out that the variables must have the same type, and present an example comparing floats to integers, which doesn't work. I didn't expect this to extend to different types of integers, too.

Comparison 2

When doing the same comparison using ==, it does return True:

(df1 == df2).all().all()   
# True

However, == doesn't assess two missing values as equal to each other.

My question

Is there an elegant way to handle missing values as equal, whilst not enforcing the same integer type? The best I can come up with is:

(df1.fillna(0) == df2.fillna(0)).all().all()

but there has to be a more concise and less hacky way to deal with this problem.

My follow up, opinion-based question: Would you consider this a bug?

Comparing two pandas dataframes with different integer types

Comparison 1

Comparison 2

My question

Answers (1)

Related Questions