Matt-pow
Matt-pow

Reputation: 986

Compare differences columns from 2 dataframes

df 1
-----
id  rank    value   group
0   1       999      A
1   2        3       A
2   3       345      B
3   56       8       C
4   7       54       D
_____
df 2
_____
id rank    value    group
0   1      111       A
1   5       3        B
2   6      345       B
3  56       11       C
4   7       2        D
5   4       92       E

and I got the number of rows that are different

df1 = df1.set_index('id') ; df2 = df2.set_index('id')
df1=df1[df1.index.isin(df2.index)]
df2=df2[df2.index.isin(df1.index)]
diff = df1.ne(df2)
diff.sum()

but I want to set a condition on rank. If it's less than 10, then I will evaluate to true even though the values from 2 columns are different.

def within_rank(a, b):
    if a < 10 and b < 10:
       return true
    return a != b

Expected output. rank 0 value 3 group 1

Upvotes: 0

Views: 48

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

You can check in addition that the rank in df1 and df2 are >=10:

diff['rank'] &= (df1['rank'].ge(10) | df1['rank'].ge(10))

diff.sum()

Output:

rank     0
value    3
group    1
dtype: int64

Upvotes: 1

Related Questions