swepab
swepab

Reputation: 527

Subtract series value from pandas data frame given multiple index

I have a big data frame in Pandas, table A, with structure like below:

key1   key2     value1 
1      201501     12      
2      201502     4     
3      201503     3      
4      201506     9      
5      201507     15
6      201509     nan 

from table A, colum value1, I want to subtract value2 from table B with apperance like below, using key1 and key2 as joining keys:

key1   key2     value2 
1      201501     11      
3      201503     2
5      201507     14

I want the the following in table A:

key1   key2     value1 
1      201501     1      
2      201502     4     
3      201503     1      
4      201506     9      
5      201507     1
6      201509     nan 

How can I achieve this in a super efficient way? Today I join together the two tables and substrat value1 in A with value2 from B, my questions is if this can be done in a smarter pythonic "look-up" fashion which is more sleek and compact?

Data Frame code below

import numpy as np

tableA= pd.DataFrame({'key1':[1,2,3,4,5,6],
                'key2':[201501,201502,201503,201506,201507,201509],
                'value1':[12,4,3,9,15,np.nan]
                })



tableB= pd.DataFrame({'key1':[1,3,5],
                'key2':[201501,201503,201507],
                'value1':[11,2,14]
                })

Upvotes: 2

Views: 646

Answers (2)

piRSquared
piRSquared

Reputation: 294508

tableA.set_index(keys).value1 \
    .sub(tableB.set_index(keys).value1, fill_value=0) \
    .reset_index()

enter image description here

Upvotes: 1

jezrael
jezrael

Reputation: 863321

You can create DataFrames with MultiIndexes by set_index, then substract by sub and fillna by first DataFrame:

print (tableA.set_index(['key1','key2'])
             .sub(tableB.set_index(['key1','key2']))
             .fillna(tableA.set_index(['key1','key2']))
             .reset_index())

   key1    key2  value1
0     1  201501     1.0
1     2  201502     4.0
2     3  201503     1.0
3     4  201506     9.0
4     5  201507     1.0
5     6  201509     NaN

Another solution with combine_first:

print (tableA.set_index(['key1','key2'])
             .sub(tableB.set_index(['key1','key2']))
             .combine_first(tableA.set_index(['key1','key2']))
             .reset_index())

   key1    key2  value1
0     1  201501     1.0
1     2  201502     4.0
2     3  201503     1.0
3     4  201506     9.0
4     5  201507     1.0
5     6  201509     NaN

Upvotes: 1

Related Questions