The Great
The Great

Reputation: 7733

How to map based on multiple keys in pandas without merge

I have two dataframes as given below

1st dataframe

data_file = pd.DataFrame({'person_id':[1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,3],
             'event name': ['Second','First','Second','First','Second','First','Second','First','Second','Second','First','Second','First','Second','First','Second','First','First'],
             'ob.date': [np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]
             })

2nd dataframe

out_data = pd.DataFrame({'person_id':[1,1,2,2,3,3],'event name':['First','Second','First','Second','First','Second'],
                     'ob.date': ['23/08/2017','23/08/2017','11/08/2017','31/08/2017','25/08/2017','22/08/2017']})

They 1st dataframe looks like as shown below

enter image description here

The 2nd dataframe looks like as shown below

enter image description here

What I would like to do is map the ob.date values from out_data dataframe to data_file based on person_id and event name.

This is what I tried

s = out_data.set_index(['person_id','event name'])['ob.date']
data_file['ob.date'] = data_file[('person_id','event name')].map(s)

Encountered the below error

KeyError: ('person_id', 'event name')

# But merge works well. Is the below correct?

pd.merge(data_file,out_data, on = ['person_id','event name'],how = 'inner')

How can I avoid this and map the date values based on multiple keys and achieve an output like below ?

enter image description here

Upvotes: 1

Views: 395

Answers (1)

jezrael
jezrael

Reputation: 863226

I think here is better merge with left join:

df = pd.merge(data_file,out_data, on = ['person_id','event name'], how = 'left')

map is possible, but need tuples from both columns:

s = out_data.set_index(['person_id','event name'])['ob.date']
s.index = s.index.tolist()
print (s)
(1, First)     23/08/2017
(1, Second)    23/08/2017
(2, First)     11/08/2017
(2, Second)    31/08/2017
(3, First)     25/08/2017
(3, Second)    22/08/2017
Name: ob.date, dtype: object

s1 = pd.Series(list(map(tuple, data_file[['person_id','event name']].values.tolist())), 
               index=data_file.index)
data_file['ob.date'] = s1.map(s)

Or similar:

s1 = data_file.set_index(['person_id','event name']).index.to_series()
s1.index = data_file.index
data_file['ob.date'] = s1.map(s)

print (data_file)
   person_id event name     ob.date
0           1     Second  23/08/2017
1           1      First  23/08/2017
2           1     Second  23/08/2017
3           1      First  23/08/2017
4           1     Second  23/08/2017
5           1      First  23/08/2017
6           1     Second  23/08/2017
7           2      First  11/08/2017
8           2     Second  31/08/2017
9           2     Second  31/08/2017
10          2      First  11/08/2017
11          3     Second  22/08/2017
12          3      First  25/08/2017
13          3     Second  22/08/2017
14          3      First  25/08/2017
15          3     Second  22/08/2017
16          3      First  25/08/2017
17          3      First  25/08/2017

Upvotes: 2

Related Questions