Reputation: 527
I have a big data frame in Pandas, table A, with structure like below:
key1 key2 value1 1 201501 12 2 201502 4 3 201503 3 4 201506 9 5 201507 15 6 201509 nan
from table A, colum value1, I want to subtract value2 from table B with apperance like below, using key1 and key2 as joining keys:
key1 key2 value2 1 201501 11 3 201503 2 5 201507 14
I want the the following in table A:
key1 key2 value1 1 201501 1 2 201502 4 3 201503 1 4 201506 9 5 201507 1 6 201509 nan
How can I achieve this in a super efficient way? Today I join together the two tables and substrat value1 in A
with value2 from B
, my questions is if this can be done in a smarter pythonic "look-up" fashion which is more sleek and compact?
Data Frame code below
import numpy as np
tableA= pd.DataFrame({'key1':[1,2,3,4,5,6],
'key2':[201501,201502,201503,201506,201507,201509],
'value1':[12,4,3,9,15,np.nan]
})
tableB= pd.DataFrame({'key1':[1,3,5],
'key2':[201501,201503,201507],
'value1':[11,2,14]
})
Upvotes: 2
Views: 646
Reputation: 294508
tableA.set_index(keys).value1 \
.sub(tableB.set_index(keys).value1, fill_value=0) \
.reset_index()
Upvotes: 1
Reputation: 863321
You can create DataFrames
with MultiIndexes
by set_index
, then substract by sub
and fillna
by first DataFrame
:
print (tableA.set_index(['key1','key2'])
.sub(tableB.set_index(['key1','key2']))
.fillna(tableA.set_index(['key1','key2']))
.reset_index())
key1 key2 value1
0 1 201501 1.0
1 2 201502 4.0
2 3 201503 1.0
3 4 201506 9.0
4 5 201507 1.0
5 6 201509 NaN
Another solution with combine_first
:
print (tableA.set_index(['key1','key2'])
.sub(tableB.set_index(['key1','key2']))
.combine_first(tableA.set_index(['key1','key2']))
.reset_index())
key1 key2 value1
0 1 201501 1.0
1 2 201502 4.0
2 3 201503 1.0
3 4 201506 9.0
4 5 201507 1.0
5 6 201509 NaN
Upvotes: 1