Reputation: 7733
I have two dataframes as given below
1st dataframe
data_file = pd.DataFrame({'person_id':[1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,3],
'event name': ['Second','First','Second','First','Second','First','Second','First','Second','Second','First','Second','First','Second','First','Second','First','First'],
'ob.date': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]
})
2nd dataframe
out_data = pd.DataFrame({'person_id':[1,1,2,2,3,3],'event name':['First','Second','First','Second','First','Second'],
'ob.date': ['23/08/2017','23/08/2017','11/08/2017','31/08/2017','25/08/2017','22/08/2017']})
They 1st dataframe looks like as shown below
The 2nd dataframe looks like as shown below
What I would like to do is map the ob.date
values from out_data
dataframe to data_file
based on person_id
and event name
.
This is what I tried
s = out_data.set_index(['person_id','event name'])['ob.date']
data_file['ob.date'] = data_file[('person_id','event name')].map(s)
Encountered the below error
KeyError: ('person_id', 'event name')
# But merge works well. Is the below correct?
pd.merge(data_file,out_data, on = ['person_id','event name'],how = 'inner')
How can I avoid this and map the date values based on multiple keys and achieve an output like below ?
Upvotes: 1
Views: 395
Reputation: 863226
I think here is better merge with left join:
df = pd.merge(data_file,out_data, on = ['person_id','event name'], how = 'left')
map
is possible, but need tuples from both columns:
s = out_data.set_index(['person_id','event name'])['ob.date']
s.index = s.index.tolist()
print (s)
(1, First) 23/08/2017
(1, Second) 23/08/2017
(2, First) 11/08/2017
(2, Second) 31/08/2017
(3, First) 25/08/2017
(3, Second) 22/08/2017
Name: ob.date, dtype: object
s1 = pd.Series(list(map(tuple, data_file[['person_id','event name']].values.tolist())),
index=data_file.index)
data_file['ob.date'] = s1.map(s)
Or similar:
s1 = data_file.set_index(['person_id','event name']).index.to_series()
s1.index = data_file.index
data_file['ob.date'] = s1.map(s)
print (data_file)
person_id event name ob.date
0 1 Second 23/08/2017
1 1 First 23/08/2017
2 1 Second 23/08/2017
3 1 First 23/08/2017
4 1 Second 23/08/2017
5 1 First 23/08/2017
6 1 Second 23/08/2017
7 2 First 11/08/2017
8 2 Second 31/08/2017
9 2 Second 31/08/2017
10 2 First 11/08/2017
11 3 Second 22/08/2017
12 3 First 25/08/2017
13 3 Second 22/08/2017
14 3 First 25/08/2017
15 3 Second 22/08/2017
16 3 First 25/08/2017
17 3 First 25/08/2017
Upvotes: 2