Reputation:
There are quite a few similar questions out there, but I am not sure if there is one that tackles both index and row values. (relevant to binary classification df)
So what I am trying to do is compare the columns with the same name to have the same values and index. If not, simply return an error.
Let's say DataFrame df
has columns a
, b
and c
and df_orginal
has columns from a
to z
.
How can we first find the columns that have the same name between those 2 DataFrames, and then check the contents of those columns such that they match row by row in value and index between a
, b
and c
from df
and df_orginal
The contents of all the columns are numerical, that's why I want to compare the combination of index and values
Demo:
In [1]: df
Out[1]:
a b c
0 0 1 2
1 1 2 0
2 0 1 0
3 1 1 0
4 3 1 0
In [3]: df_orginal
Out[3]:
a b c d e f g ......
0 4 3 1 1 0 0 0
1 3 1 2 1 1 2 1
2 1 2 1 1 1 2 1
3 3 4 1 1 1 2 1
4 0 3 0 0 1 1 1
In the above example, for those columns that have the same column name, compare the combination of index and value and flag an error if the combination of index and value is not correct
Upvotes: 1
Views: 243
Reputation: 1062
common_cols = df.columns.intersection(df_original.columns)
for col in common_cols:
df1_ind_val_pair = df[col].index.astype(str) + ' ' + df[col].astype(str)
df2_ind_val_pair = df_original[col].index.astype(str) + ' ' + df_original[col].astype(str)
if any(df1_ind_val_pair != df2_ind_val_pair):
print('Found one or more unequal (index, value) pairs in col {}'.format(col))
Upvotes: 0
Reputation: 294218
IIUC:
Use pd.DataFrame.align
with a join method of inner
. Then pass the resulting tuple
unpacked to pd.DataFrame.eq
pd.DataFrame.eq(*df.align(dfo, 'inner'))
a b c
0 False False False
1 False False False
2 False False False
3 False False False
4 False False True
To see rows that have all columns True
, filter with this mask:
pd.DataFrame.eq(*df.align(dfo, 'inner')).all(1)
0 False
1 False
2 False
3 False
4 False
dtype: bool
with the sample data however, the result will be empty
df[pd.DataFrame.eq(*df.align(dfo, 'inner')).all(1)]
Empty DataFrame
Columns: [a, b, c]
Index: []
Same answer but with clearer code
def eq(d1, d2):
d1, d2 = d1.align(d2, 'inner')
return d1 == d2
eq(df, dfo)
a b c
0 False False False
1 False False False
2 False False False
3 False False False
4 False False True
Upvotes: 0