Reputation: 23
What is the most efficient way to fill NA values in one dataframe using another when there are more than one column to match (so in this case city and rooms)?
Example dataframes to be combined and the result dataframe:
import pandas as pd
import numpy as np
d1 = {'city' : ['New York', 'Shanghai', 'Boston', 'Shanghai',
'Shanghai'],
'rooms': ["1","2","3","2","2"], 'floor': ["4","5","6","10","8"], 'rent':
[500, np.nan, 1500, 2000, np.nan]}
d2 = {'city' : ['Shanghai'],
'rooms': ["2"], 'rent': [1000]}
df1 = pd.DataFrame(data = d1)
df2 = pd.DataFrame(data = d2)
result = {'city' : ['New York', 'Shanghai','Boston', 'Shanghai',
'Shanghai'],
'rooms': ["1","2","3","2","2"], 'floor': ["4","5","6","10","8"], 'rent':
[500, 1000, 1500, 2000, 1000]}
result_df = pd.DataFrame(data = result)
Upvotes: 1
Views: 499
Reputation: 28644
Set the indices on both columns to align, and fill on the required column. In this case, the common columns are city
and rooms
:
cols = ['city', 'rooms']
Set index for df1
:
df1 = df1.set_index(cols)
Set index for df2
:
df2 = df2.set_index(cols).rent # make it a Series
Fill df1 with df2 and reset index (indexes are good/useful):
df1.fillna({"rent": df2}).reset_index()
city rooms floor rent
0 New York 1 4 500.0
1 Shanghai 2 5 1000.0
2 Boston 3 6 1500.0
3 Shanghai 2 10 2000.0
4 Shanghai 2 8 1000.0
Note that this works only if the data from df2 is unique
Upvotes: 1