Reputation: 986
df 1
-----
id rank value group
0 1 999 A
1 2 3 A
2 3 345 B
3 56 8 C
4 7 54 D
_____
df 2
_____
id rank value group
0 1 111 A
1 5 3 B
2 6 345 B
3 56 11 C
4 7 2 D
5 4 92 E
and I got the number of rows that are different
df1 = df1.set_index('id') ; df2 = df2.set_index('id')
df1=df1[df1.index.isin(df2.index)]
df2=df2[df2.index.isin(df1.index)]
diff = df1.ne(df2)
diff.sum()
but I want to set a condition on rank. If it's less than 10, then I will evaluate to true even though the values from 2 columns are different.
def within_rank(a, b):
if a < 10 and b < 10:
return true
return a != b
Expected output. rank 0 value 3 group 1
Upvotes: 0
Views: 48
Reputation: 150735
You can check in addition that the rank in df1
and df2
are >=10
:
diff['rank'] &= (df1['rank'].ge(10) | df1['rank'].ge(10))
diff.sum()
Output:
rank 0
value 3
group 1
dtype: int64
Upvotes: 1