Reputation: 57
I have two pandas dataframes.
import pandas as pd
data1 = {'id1': [625,625,725,625,725,1130,625,1130,725,1130],
'id2': ['AF','AG','AF','AP','AP','BM','BA','BC','BM','AF'],
'Total': [75,68,33,77,42,25,113,80,72,36]}
df1 = pd.DataFrame(data1, columns = ['id1','id2','Total'])
data2 = {'id1': [625,725,625,625,1130,1130,625,725,1130,625],
'id2': ['AF','AF','AG','AP','AF','AG','BA','BA','BM','BM'],
'Part1': [5,8,3,4,2,6,1,2,6,3]}
df2 = pd.DataFrame(data2, columns = ['id1','id2','Part1'])
And I get these two data frames.
df1 id1 id2 Total
0 625 AF 75
1 625 AG 68
2 725 AF 33
3 625 AP 77
4 725 AP 42
5 1130 BM 25
6 625 BA 113
7 1130 BC 80
8 725 BM 72
9 1130 AF 36
df2 id1 id2 Part1
0 625 AF 5
1 725 AF 8
2 625 AG 3
3 625 AP 4
4 1130 AF 2
5 1130 AG 6
6 625 BA 1
7 725 BA 2
8 1130 BM 6
9 625 BM 3
What I want is to create a third dataframe where I get to perserve each unique combination of id1 and id2 while substracting values in column 'Part1' in df2 from 'Total' in df1, given that each id1 and id2 combination only appears once in either of the dataframes.
For example: The combination of '625' and 'AF' gives a value of 75 in df1, and 5 in df2. What I want is to create a third dataframe where a row would have '625', 'AF', and '70' in three columns.
If one combination appears in df1 but not the df2, we treat it as if it exists in df2 but the value is 0, and vice versa.
Not sure if I described it sufficiently.
Upvotes: 2
Views: 144
Reputation: 862731
Use Series.sub
with fill_value=0
parameter for subtraction with convert columns id1, id2
for MultiIndex
, so subtract is based by these columns:
df = (df1.set_index(['id1','id2'])['Total']
.sub(df2.set_index(['id1','id2'])['Part1'], fill_value=0)
.reset_index(name='new'))
print (df)
id1 id2 new
0 625 AF 70.0
1 625 AG 65.0
2 625 AP 73.0
3 625 BA 112.0
4 625 BM -3.0
5 725 AF 25.0
6 725 AP 42.0
7 725 BA -2.0
8 725 BM 72.0
9 1130 AF 34.0
10 1130 AG -6.0
11 1130 BC 80.0
12 1130 BM 19.0
Upvotes: 3