Boolean comparison of each value across two data frames

Question

I have two dataframes (read from csvs) with identical columns but likely different rows. I'm trying to produce a third data frame that has the index on the left and a TRUE or FALSE value for each column where there is a match for a given index record. Here is a simplistic example

df1 = pd.DataFrame(np.array([
    ['100', 'a', 1, 'aa'],
    ['101', 'b', 2, 'bb'],
    ['102', 'c', 3, 'cc']]),
    columns=['ID', 'Col1', 'Col2', 'Col3']).set_index('ID')
df2 = pd.DataFrame(np.array([
    ['100', 'a', 1, 'aa'],
    ['101', 'b', 2, 'bb'],
    ['102', 'c', 3, 'cb']]),
    columns=['ID', 'Col1', 'Col2', 'Col3']).set_index('ID')

df3 = pd.DataFrame(np.where(df1==df2,True,False),
                   columns=df1.columns,
                   index=df1.index)
print(df3)

This produces:

ID                    
100  True  True   True
101  True  True   True
102  True  True  False

The issue I have is that the number of records will differ and could be out of order. There could be a row with ID 104 in df1 and a row with ID 105 in df2. This causes a "Can only compare identically-labeled DataFrame objects" error to be thrown.

I'm thinking np.where isn't flexible enough for this? Any advice would be greatly appreciated!

Boolean comparison of each value across two data frames

Answers (1)

Related Questions