psionk
psionk

Reputation: 23

Filling NA values in one dataframe by another based on two columns

What is the most efficient way to fill NA values in one dataframe using another when there are more than one column to match (so in this case city and rooms)?

Example dataframes to be combined and the result dataframe:

import pandas as pd
import numpy as np

d1 = {'city' : ['New York', 'Shanghai', 'Boston', 'Shanghai', 
'Shanghai'],
'rooms': ["1","2","3","2","2"], 'floor': ["4","5","6","10","8"], 'rent': 
 [500, np.nan, 1500, 2000, np.nan]}


d2 = {'city' : ['Shanghai'],
'rooms': ["2"], 'rent': [1000]}

df1 = pd.DataFrame(data = d1)
df2 = pd.DataFrame(data = d2)

result = {'city' : ['New York', 'Shanghai','Boston', 'Shanghai', 
'Shanghai'],
'rooms': ["1","2","3","2","2"], 'floor': ["4","5","6","10","8"], 'rent': 
[500, 1000, 1500, 2000, 1000]}

result_df = pd.DataFrame(data = result)

Upvotes: 1

Views: 499

Answers (1)

sammywemmy
sammywemmy

Reputation: 28644

Set the indices on both columns to align, and fill on the required column. In this case, the common columns are city and rooms:

cols = ['city', 'rooms']

Set index for df1:

df1 = df1.set_index(cols)

Set index for df2 :

df2 = df2.set_index(cols).rent # make it a Series

Fill df1 with df2 and reset index (indexes are good/useful):

df1.fillna({"rent": df2}).reset_index()

       city rooms floor    rent
0  New York     1     4   500.0
1  Shanghai     2     5  1000.0
2    Boston     3     6  1500.0
3  Shanghai     2    10  2000.0
4  Shanghai     2     8  1000.0

Note that this works only if the data from df2 is unique

Upvotes: 1

Related Questions