Eric Broda
Eric Broda

Reputation: 7231

Complex join between two pandas dataframes

I have two large pandas dataframes that I need to join and then fill in missing values:

Dataframe #1 has columns A,B, and C,

and

Dataframe #2 has columns A,B, and C, but with all values from C missing

I am trying to join dataframes #1 and #2 such that where the values in columns A and B are equal in both dataframes, then replace missing value in dataframe #2 with the value in dataframe #1.

I have tried a "brute force" approach by looping through the various dataframes but the volume of data in each of the dataframes makes this a challenge.

Is there an efficient method to do this using pandas and/or numpy?

Also: I am not sure if it matters, but all values are numeric integer values.

Any help is appreciated!

Upvotes: 0

Views: 348

Answers (1)

Colonel Beauvel
Colonel Beauvel

Reputation: 31171

This an operation for merge:

import pandas as pd

df1 = pd.DataFrame({'A':[1,2,3],'B':[4,5,6],'C':[np.NaN, np.NaN, np.NaN]})

df2 = pd.DataFrame({'A':[1,3],'B':[4,6],'C':[44, 55]})

In [150]: pd.merge(df1.drop(['C'], axis=1), df2, how='left', on=['A','B'])
Out[150]:
   A  B   C
0  1  4  44
1  2  5 NaN
2  3  6  55

Upvotes: 4

Related Questions