Shan Jiang
Shan Jiang

Reputation: 57

How to subtract values in one dataframe from the other based on multiple columns?

I have two pandas dataframes.

import pandas as pd
data1 = {'id1': [625,625,725,625,725,1130,625,1130,725,1130], 
        'id2': ['AF','AG','AF','AP','AP','BM','BA','BC','BM','AF'],
        'Total': [75,68,33,77,42,25,113,80,72,36]}
df1 = pd.DataFrame(data1, columns = ['id1','id2','Total'])


data2 = {'id1': [625,725,625,625,1130,1130,625,725,1130,625], 
        'id2': ['AF','AF','AG','AP','AF','AG','BA','BA','BM','BM'],
        'Part1': [5,8,3,4,2,6,1,2,6,3]}

df2 = pd.DataFrame(data2, columns = ['id1','id2','Part1'])

And I get these two data frames.

df1     id1 id2  Total
0   625  AF     75
1   625  AG     68
2   725  AF     33
3   625  AP     77
4   725  AP     42
5  1130  BM     25
6   625  BA    113
7  1130  BC     80
8   725  BM     72
9  1130  AF     36
df2     id1 id2  Part1
0   625  AF      5
1   725  AF      8
2   625  AG      3
3   625  AP      4
4  1130  AF      2
5  1130  AG      6
6   625  BA      1
7   725  BA      2
8  1130  BM      6
9   625  BM      3

What I want is to create a third dataframe where I get to perserve each unique combination of id1 and id2 while substracting values in column 'Part1' in df2 from 'Total' in df1, given that each id1 and id2 combination only appears once in either of the dataframes.

For example: The combination of '625' and 'AF' gives a value of 75 in df1, and 5 in df2. What I want is to create a third dataframe where a row would have '625', 'AF', and '70' in three columns.

If one combination appears in df1 but not the df2, we treat it as if it exists in df2 but the value is 0, and vice versa.

Not sure if I described it sufficiently.

Upvotes: 2

Views: 144

Answers (1)

jezrael
jezrael

Reputation: 862731

Use Series.sub with fill_value=0 parameter for subtraction with convert columns id1, id2 for MultiIndex, so subtract is based by these columns:

df = (df1.set_index(['id1','id2'])['Total']
         .sub(df2.set_index(['id1','id2'])['Part1'], fill_value=0)
         .reset_index(name='new'))
print (df)
     id1 id2    new
0    625  AF   70.0
1    625  AG   65.0
2    625  AP   73.0
3    625  BA  112.0
4    625  BM   -3.0
5    725  AF   25.0
6    725  AP   42.0
7    725  BA   -2.0
8    725  BM   72.0
9   1130  AF   34.0
10  1130  AG   -6.0
11  1130  BC   80.0
12  1130  BM   19.0

Upvotes: 3

Related Questions