user9238790
user9238790

Reputation:

Compare columns of 2 dataframes with a combination of index and row value

There are quite a few similar questions out there, but I am not sure if there is one that tackles both index and row values. (relevant to binary classification df)

So what I am trying to do is compare the columns with the same name to have the same values and index. If not, simply return an error.

Let's say DataFrame df has columns a, b and c and df_orginal has columns from a to z.

How can we first find the columns that have the same name between those 2 DataFrames, and then check the contents of those columns such that they match row by row in value and index between a, b and c from df and df_orginal

The contents of all the columns are numerical, that's why I want to compare the combination of index and values

Demo:

In [1]: df
Out[1]:
   a  b  c  
0  0  1  2  
1  1  2  0  
2  0  1  0  
3  1  1  0  
4  3  1  0  

In [3]: df_orginal
Out[3]:
   a  b c d e f g ......
0  4  3 1 1 0 0 0
1  3  1 2 1 1 2 1
2  1  2 1 1 1 2 1
3  3  4 1 1 1 2 1
4  0  3 0 0 1 1 1

In the above example, for those columns that have the same column name, compare the combination of index and value and flag an error if the combination of index and value is not correct

Upvotes: 1

Views: 243

Answers (2)

Shivam Gaur
Shivam Gaur

Reputation: 1062

common_cols = df.columns.intersection(df_original.columns)

for col in common_cols:

    df1_ind_val_pair = df[col].index.astype(str) + ' ' + df[col].astype(str)
    df2_ind_val_pair = df_original[col].index.astype(str) + ' ' + df_original[col].astype(str)

    if any(df1_ind_val_pair != df2_ind_val_pair):
        print('Found one or more unequal (index, value) pairs in col {}'.format(col))

Upvotes: 0

piRSquared
piRSquared

Reputation: 294218

IIUC:

Use pd.DataFrame.align with a join method of inner. Then pass the resulting tuple unpacked to pd.DataFrame.eq

pd.DataFrame.eq(*df.align(dfo, 'inner'))

       a      b      c
0  False  False  False
1  False  False  False
2  False  False  False
3  False  False  False
4  False  False   True

To see rows that have all columns True, filter with this mask:

pd.DataFrame.eq(*df.align(dfo, 'inner')).all(1)

0    False
1    False
2    False
3    False
4    False
dtype: bool

with the sample data however, the result will be empty

df[pd.DataFrame.eq(*df.align(dfo, 'inner')).all(1)]

Empty DataFrame
Columns: [a, b, c]
Index: []

Same answer but with clearer code

def eq(d1, d2):
    d1, d2 = d1.align(d2, 'inner')
    return d1 == d2

eq(df, dfo)

       a      b      c
0  False  False  False
1  False  False  False
2  False  False  False
3  False  False  False
4  False  False   True

Upvotes: 0

Related Questions