Python: Creating column based on a condition from the other dataframe

Question

I have two data frames as follows:

 df1=
   date               company      userDomain   keyword  pageViews category  
  2015-12-02   1-800 Contacts     glasses.com     SAN          2   STORAGE   
  2015-12-02   1-800 Contacts        rhgi.com     SAN          3   STORAGE   
  2015-12-02  100 Percent Fun dialogdesign.ca     SAN          1   STORAGE   
  2015-12-02       101netlink  101netlink.com     SAN          8   STORAGE   
  2015-12-02             1020       nlc.bc.ca     SAN          4   STORAGE  

df2= 
                      Outcome                     Job Title  Wave  
          Created Opportunity                    IT Manager   1.0   
  Closed Out Prospect/Contact        Infrastructure Manager   1.0   
                          NaN                   IT Director   1.0   
                          NaN  Supervisor Technical Support   1.0   
          Created Opportunity       Director of IT Services   1.0   

           Wave Date       userDomain  
 2016-02-16 15:07:05  dialogdesign.ca  
 2016-02-16 15:07:05         rhgi.com  
 2016-02-16 15:07:05     surefire.com  
 2016-02-16 15:07:05      isd2144.org  
 2016-02-16 15:07:05        nlc.bc.ca

I would like to add a column in df1 called wave_date with dates from df2['Wave Date'] for all the df1['userDomain'] is in the df2['userDomain'] If there is no match of userDomain in both the frames, the value should be nan. I'm sorry if this is a very naive question but I'm frustrated with my failure. What I'm doing is something like this:

df1['wave_date'] = df1.apply(lambda x: df2['Wave Date'] if x['userDomain'].isin(df2['userDomain']) else np.nan)

I keep getting

IndexError: ('userDomain', 'occurred at index date') Can you please point out the correct to do it? Thanks a lot

piRSquared · Accepted Answer

m = dict(zip(df2['userDomain'], df2['Wave Date']))
df1.assign(wave_date=df1.userDomain.map(m))

         date          company       userDomain keyword  pageViews category            wave_date
0  2015-12-02   1-800 Contacts      glasses.com     SAN          2  STORAGE                  NaN
1  2015-12-02   1-800 Contacts         rhgi.com     SAN          3  STORAGE  2016-02-16 15:07:05
2  2015-12-02  100 Percent Fun  dialogdesign.ca     SAN          1  STORAGE  2016-02-16 15:07:05
3  2015-12-02       101netlink   101netlink.com     SAN          8  STORAGE                  NaN
4  2015-12-02             1020        nlc.bc.ca     SAN          4  STORAGE  2016-02-16 15:07:05

Python: Creating column based on a condition from the other dataframe

Answers (1)

Related Questions