Reputation: 2556
I have two data frames as follows:
df1=
date company userDomain keyword pageViews category
2015-12-02 1-800 Contacts glasses.com SAN 2 STORAGE
2015-12-02 1-800 Contacts rhgi.com SAN 3 STORAGE
2015-12-02 100 Percent Fun dialogdesign.ca SAN 1 STORAGE
2015-12-02 101netlink 101netlink.com SAN 8 STORAGE
2015-12-02 1020 nlc.bc.ca SAN 4 STORAGE
df2=
Outcome Job Title Wave
Created Opportunity IT Manager 1.0
Closed Out Prospect/Contact Infrastructure Manager 1.0
NaN IT Director 1.0
NaN Supervisor Technical Support 1.0
Created Opportunity Director of IT Services 1.0
Wave Date userDomain
2016-02-16 15:07:05 dialogdesign.ca
2016-02-16 15:07:05 rhgi.com
2016-02-16 15:07:05 surefire.com
2016-02-16 15:07:05 isd2144.org
2016-02-16 15:07:05 nlc.bc.ca
I would like to add a column in df1
called wave_date
with dates from df2['Wave Date']
for all the df1['userDomain']
is in the df2['userDomain']
If there is no match of userDomain
in both the frames, the value should be nan
. I'm sorry if this is a very naive question but I'm frustrated with my failure. What I'm doing is something like this:
df1['wave_date'] = df1.apply(lambda x: df2['Wave Date'] if x['userDomain'].isin(df2['userDomain']) else np.nan)
I keep getting
IndexError: ('userDomain', 'occurred at index date') Can you please point out the correct to do it? Thanks a lot
Upvotes: 1
Views: 43
Reputation: 294258
m = dict(zip(df2['userDomain'], df2['Wave Date']))
df1.assign(wave_date=df1.userDomain.map(m))
date company userDomain keyword pageViews category wave_date
0 2015-12-02 1-800 Contacts glasses.com SAN 2 STORAGE NaN
1 2015-12-02 1-800 Contacts rhgi.com SAN 3 STORAGE 2016-02-16 15:07:05
2 2015-12-02 100 Percent Fun dialogdesign.ca SAN 1 STORAGE 2016-02-16 15:07:05
3 2015-12-02 101netlink 101netlink.com SAN 8 STORAGE NaN
4 2015-12-02 1020 nlc.bc.ca SAN 4 STORAGE 2016-02-16 15:07:05
Upvotes: 1