Reputation: 7231
I have two large pandas dataframes that I need to join and then fill in missing values:
Dataframe #1 has columns A,B, and C,
and
Dataframe #2 has columns A,B, and C, but with all values from C missing
I am trying to join dataframes #1 and #2 such that where the values in columns A and B are equal in both dataframes, then replace missing value in dataframe #2 with the value in dataframe #1.
I have tried a "brute force" approach by looping through the various dataframes but the volume of data in each of the dataframes makes this a challenge.
Is there an efficient method to do this using pandas and/or numpy?
Also: I am not sure if it matters, but all values are numeric integer values.
Any help is appreciated!
Upvotes: 0
Views: 348
Reputation: 31171
This an operation for merge
:
import pandas as pd
df1 = pd.DataFrame({'A':[1,2,3],'B':[4,5,6],'C':[np.NaN, np.NaN, np.NaN]})
df2 = pd.DataFrame({'A':[1,3],'B':[4,6],'C':[44, 55]})
In [150]: pd.merge(df1.drop(['C'], axis=1), df2, how='left', on=['A','B'])
Out[150]:
A B C
0 1 4 44
1 2 5 NaN
2 3 6 55
Upvotes: 4