peter_b
peter_b

Reputation: 21

Comparing Pandas Dataframe's matching rows and column for differences

I'm wondering which is the best way to compare every cell of two dataframes, only for cells that matches the first dataframe rows and columns, as an example:

df1 =

enter image description here

df2 =

enter image description here

My desired output is to get every cell change between the two dataframes for rows of df2 with the same item name and for columns in df1 that exists in df2, in this case:

enter image description here

Any thoughts on how to perform this for a bigger dataframe rather than two loops are welcome.

Upvotes: 2

Views: 236

Answers (1)

DavidK
DavidK

Reputation: 2564

You can use pd.melt to do what you want.

Like in this example :

import pandas as pd

df_before = pd.DataFrame({'item':['A','B','C', 'D'], 'value':[1,2,3,4]})
df_after = pd.DataFrame({'item':['A','B','C', 'D'], 'value':[1,1,3,5]})

melt_before = df_before.melt(id_vars=['item'], value_vars=['value'], var_name='column')
melt_after = df_after.melt(id_vars=['item'], value_vars=['value'], var_name='column')

diff = melt_before.merge(melt_after, on=['item', 'column'], suffixes=('_old', '_new'))

print(diff[diff['value_old'] != diff['value_new']])

It prints the following Dataframe :

|--|----|------|---------|---------|
|  |item|column|value_old|value_new|
|--|----|------|---------|---------|
|1 |  B |value |2        |1        |
|3 |  D |value |4        |5        |
|--|----|------|---------|---------|

Upvotes: 2

Related Questions