Reputation: 35
I have 2 identical dataframes apart from the 'value' column, need to get the difference of the 2 dataframes on the 'value' column based on year+name+month columns, and append that to the data set.
x1 = {
"year": ["2018", "2018", "2018", "2018", "2018", "2018"],
"name": ["abc", "xyz", "pqr", "stu", "hij", "efg"],
"month": ["Jan-18", "Feb-18", "Mar-18", "Apr-18", "May-18", "Jun-18"],
"value": [100, 200, 300, 400, 500, 600],
}
x2 = {
"year": ["2019", "2019", "2019", "2019", "2019", "2019"],
"name": ["abc", "xyz", "pqr", "stu", "hij", "efg"],
"month": ["Jan-18", "Feb-18", "Mar-18", "Apr-18", "May-18", "Jun-18"],
"value": [700, 300, 200, 500, 600, 100],
}
y1 = pd.DataFrame(x1).append(pd.DataFrame(x2), ignore_index=True)
print(y1)
result should be like rows 12 & 13
year name month value
0 2018 abc Jan-18 100
1 2018 xyz Feb-18 200
...
...
6 2019 abc Jan-18 700
7 2019 xyz Feb-18 300
...
...
12 diff abc Jan-18 (700-100)
13 diff xyz Feb-18 (300-200)
Upvotes: 2
Views: 1517
Reputation: 10624
You can try this:
df=X1.append(X2)
for i in X1.name:
v1=X1.loc[(X1.name==i),'value']
v2=X2.loc[(X2.name==i),'value']
vdiff=v2-v1
d=X1.loc[(X1.name==i),'month']
df.append({'year':'diff','name':i,'month':d,'value':vdiff},
ignore_index=True)
Upvotes: 0
Reputation: 42916
First of all, when you want to append/concat two dataframes on top of each other, try to use pd.concat
.
Second, we can use df.groupby.diff()
to calculate the difference in a group.
y1 = pd.concat([x1, x2], ignore_index=True)
y1['difference'] = abs(y1.groupby(['name', 'month']).value.diff())
print(y1)
year name month value difference
0 2018 abc Jan-18 100 NaN
1 2018 xyz Feb-18 200 NaN
2 2018 pqr Mar-18 300 NaN
3 2018 stu Apr-18 400 NaN
4 2018 hij May-18 500 NaN
5 2018 efg Jun-18 600 NaN
6 2019 abc Jan-18 700 600.0
7 2019 xyz Feb-18 300 100.0
8 2019 pqr Mar-18 200 100.0
9 2019 stu Apr-18 500 100.0
10 2019 hij May-18 600 100.0
11 2019 efg Jun-18 100 500.0
y_final = y1[y1.difference.notnull()].drop('year', axis=1).reset_index(drop=True)
print(y_final)
name month value difference
0 abc Jan-18 700 600.0
1 xyz Feb-18 300 100.0
2 pqr Mar-18 200 100.0
3 stu Apr-18 500 100.0
4 hij May-18 600 100.0
5 efg Jun-18 100 500.0
Upvotes: 0
Reputation: 323276
Check with groupby
and diff
after sort_values
y2=y1.copy()
y2=y2.sort_values('year')
y2['value']=y2.groupby(['name','month']).value.diff()
y1=y1.append(y2.dropna().assign(year='diff'))
y1
year name month value
0 2018 abc Jan-18 100.0
1 2018 xyz Feb-18 200.0
2 2018 pqr Mar-18 300.0
3 2018 stu Apr-18 400.0
4 2018 hij May-18 500.0
5 2018 efg Jun-18 600.0
6 2019 abc Jan-18 700.0
7 2019 xyz Feb-18 300.0
8 2019 pqr Mar-18 200.0
9 2019 stu Apr-18 500.0
10 2019 hij May-18 600.0
11 2019 efg Jun-18 100.0
6 diff abc Jan-18 600.0
7 diff xyz Feb-18 100.0
8 diff pqr Mar-18 -100.0
9 diff stu Apr-18 100.0
10 diff hij May-18 100.0
11 diff efg Jun-18 -500.0
Upvotes: 1