Reputation: 21
I'm wondering which is the best way to compare every cell of two dataframes, only for cells that matches the first dataframe rows and columns, as an example:
df1 =
df2 =
My desired output is to get every cell change between the two dataframes for rows of df2 with the same item name and for columns in df1 that exists in df2, in this case:
Any thoughts on how to perform this for a bigger dataframe rather than two loops are welcome.
Upvotes: 2
Views: 236
Reputation: 2564
You can use pd.melt to do what you want.
Like in this example :
import pandas as pd
df_before = pd.DataFrame({'item':['A','B','C', 'D'], 'value':[1,2,3,4]})
df_after = pd.DataFrame({'item':['A','B','C', 'D'], 'value':[1,1,3,5]})
melt_before = df_before.melt(id_vars=['item'], value_vars=['value'], var_name='column')
melt_after = df_after.melt(id_vars=['item'], value_vars=['value'], var_name='column')
diff = melt_before.merge(melt_after, on=['item', 'column'], suffixes=('_old', '_new'))
print(diff[diff['value_old'] != diff['value_new']])
It prints the following Dataframe :
|--|----|------|---------|---------|
| |item|column|value_old|value_new|
|--|----|------|---------|---------|
|1 | B |value |2 |1 |
|3 | D |value |4 |5 |
|--|----|------|---------|---------|
Upvotes: 2