Reputation: 2248
I have a large DataFrame with many columns. I also have a smaller DataFrame with two columns - call them 'label' and 'value', both of which are in the larger DataFrame. I want to replace the 'value' column of the larger DataFrame with the 'value' column of the smaller DataFrame, for rows of the larger DataFrame where 'label' matches a row in the smaller DataFrame.
The logic in terms of a for loop is:
largeDF = pd.DataFrame([['a',1],['b',2],['c',3],['d',4],['e',5]], columns=['label','value']).set_index('label')
smallDF = pd.DataFrame([['d',6],['e',7]], columns=['label','value']).set_index('label')
for label in smallDF.index:
largeDF.loc[label,'value'] = smallDF.loc[label,'value']
Unfortunately the obvious answer of
largeDF['value'] = smallDF['value']
doesn't work because it NaNs the labels that are not in the small DF. Using .map() has a similar problem.
The for loop I'm using seems very unpythonic, unpandas and slow. What is the better way of doing this that I am missing?
Upvotes: 1
Views: 1498
Reputation: 863301
You can use combine_first
:
print (smallDF['value'].combine_first(largeDF['value']))
label
a 1.0
b 2.0
c 3.0
d 6.0
e 7.0
Name: value, dtype: float64
All together:
largeDF = pd.DataFrame([['a',1],['b',2],['c',3],['d',4],['e',5]], columns=['label','value'])
.set_index('label')
largeDF['a'] = 1
largeDF['b'] = 'f'
smallDF = pd.DataFrame([['d',6],['e',7]], columns=['label','value']).set_index('label')
print (largeDF)
value a b
label
a 1 1 f
b 2 1 f
c 3 1 f
d 4 1 f
e 5 1 f
print (smallDF)
value
label
d 6
e 7
largeDF['value'] = smallDF['value'].combine_first(largeDF['value']).astype(int)
print (largeDF)
value a b
label
a 1 1 f
b 2 1 f
c 3 1 f
d 6 1 f
e 7 1 f
Upvotes: 3