ps2005
ps2005

Reputation: 43

Boolean comparison of each value across two data frames

I have two dataframes (read from csvs) with identical columns but likely different rows. I'm trying to produce a third data frame that has the index on the left and a TRUE or FALSE value for each column where there is a match for a given index record. Here is a simplistic example

df1 = pd.DataFrame(np.array([
    ['100', 'a', 1, 'aa'],
    ['101', 'b', 2, 'bb'],
    ['102', 'c', 3, 'cc']]),
    columns=['ID', 'Col1', 'Col2', 'Col3']).set_index('ID')
df2 = pd.DataFrame(np.array([
    ['100', 'a', 1, 'aa'],
    ['101', 'b', 2, 'bb'],
    ['102', 'c', 3, 'cb']]),
    columns=['ID', 'Col1', 'Col2', 'Col3']).set_index('ID')

df3 = pd.DataFrame(np.where(df1==df2,True,False),
                   columns=df1.columns,
                   index=df1.index)
print(df3)

This produces:

ID                    
100  True  True   True
101  True  True   True
102  True  True  False

The issue I have is that the number of records will differ and could be out of order. There could be a row with ID 104 in df1 and a row with ID 105 in df2. This causes a "Can only compare identically-labeled DataFrame objects" error to be thrown.

I'm thinking np.where isn't flexible enough for this? Any advice would be greatly appreciated!

Upvotes: 2

Views: 2187

Answers (1)

BENY
BENY

Reputation: 323236

In that case you may need to use eq:

df1.eq(df2)
      Col1   Col2   Col3
ID                      
100   True   True   True
101   True   True   True
102  False  False  False
105  False  False  False

Upvotes: 4

Related Questions