Reputation: 151
I have two dataframes ( df1 & df2) with same headings.
Both dataframes contains same number of rows.
There are around 20 columns in each dataframes.
The dataframes differs for some columns (any one or more out of 20 columns)
There is a particular column (ssno) which is unique for both.
I need to generate the output(ssno) for those rows which differes in any of the 20 field'.
Please help.
Upvotes: 1
Views: 143
Reputation: 862751
First compare both DataFrame
s and get al least one True
s per rows by any
and then use boolean indexing
for filtering ssno
column:
df1 = pd.DataFrame({'ssno':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[70,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,90,2,4],
'F':list('aaXbbb')})
print (df1)
B C D E F ssno
0 4 70 1 5 a a
1 5 8 3 3 a b
2 4 9 5 6 X c
3 5 4 7 90 b d
4 5 2 1 2 b e
5 4 3 0 4 b f
df2 = pd.DataFrame({'ssno':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df2)
B C D E F ssno
0 4 7 1 5 a a
1 5 8 3 3 a b
2 4 9 5 6 a c
3 5 4 7 9 b d
4 5 2 1 2 b e
s = df1.loc[(df1 != df2).any(1), 'ssno']
print (s)
0 a
2 c
3 d
Name: ssno, dtype: object
Detail:
print (df1 != df2)
B C D E F ssno
0 False True False False False False
1 False False False False False False
2 False False False False True False
3 False False False True False False
4 False False False False False False
5 False False False False False False
print ((df1 != df2).any(1))
0 True
1 False
2 True
3 True
4 False
5 False
dtype: bool
Upvotes: 3