spiff
spiff

Reputation: 1495

Pandas: Change between two dataframes

I am trying to find a way to achieve the below: df1 and df2 are 2 example dataframes between which I neeed to find the difference. They have the same columns, but row data can change based on new names added, old removed or data of existing set changed.

df1:

  Col1 Col2  Col3 Col4    Col5
0     1  ABC    94  xxx   apple
1     1  DEF    24  xxx   apple
2     2  ABC    40  yyy  banana
3     3  ABC    74  zzz    pear
4     3  DEF    43  zzz    pear

df2:

   Col1 Col2  Col3 Col4       Col5
0     1  ABC    71  xxx      apple
1     2  PQR    65  yyy     banana
2     3  ABC    86  zzz       pear
3     3  DEF    53  zzz       pear
4     4  PQR    26  mmm  pineapple

output:

   Col1 Col2  Col3 Col4       Col5
0     1  ABC   -23  xxx      apple
1     1  DEF   -24  xxx      apple
2     2  ABC   -40  yyy     banana
3     2  PQR    65  yyy     banana
4     3  ABC    12  zzz       pear
5     3  DEF    10  zzz       pear
6     4  PQR    26  mmm  pineapple

I have tried this and the stackoverflow links that it mentions, but I need the change to only be applicable to specific columns only and append new rows with a new key (Col1 in my example) altogether

Thanks vm for taking a look!

Upvotes: 0

Views: 238

Answers (2)

Aakash Makwana
Aakash Makwana

Reputation: 754

As per my understanding you are doing df2-df1 on Col3, you can try

df2.set_index(['Col1', 'Col2', 'Col4', 'Col5']).sub(df1.set_index(['Col1', 'Col2', 'Col4', 'Col5']), fill_value=0).reset_index()

Upvotes: 2

user2285236
user2285236

Reputation:

If the identifiers are columns C1, C2, C4 and C5, you can set them as index and use .sub:

idx = ['Col1', 'Col2', 'Col4', 'Col5']

df2.set_index(idx).sub(df1.set_index(idx), fill_value=0)
Out[16]: 
                          Col3
Col1 Col2 Col4 Col5           
1    ABC  xxx  apple     -23.0
     DEF  xxx  apple     -24.0
2    ABC  yyy  banana    -40.0
     PQR  yyy  banana     65.0
3    ABC  zzz  pear       12.0
     DEF  zzz  pear       10.0
4    PQR  mmm  pineapple  26.0

You can call reset_index at the end, too:

df2.set_index(idx).sub(df1.set_index(idx), fill_value=0).reset_index()
Out[17]: 
   Col1 Col2 Col4       Col5  Col3
0     1  ABC  xxx      apple -23.0
1     1  DEF  xxx      apple -24.0
2     2  ABC  yyy     banana -40.0
3     2  PQR  yyy     banana  65.0
4     3  ABC  zzz       pear  12.0
5     3  DEF  zzz       pear  10.0
6     4  PQR  mmm  pineapple  26.0

Upvotes: 2

Related Questions