New Dataframe column based on conditions from different dataframe

Question

I have two dataframes:

dfB = pd.DataFrame({'Initials':['AA','BB','CC','DD','EE','FF'],
                    'Zone': ['Red','Red','Red','Yellow','Yellow','Yellow'], 
                    'Start Date':['1998-01-01','1999-01-01','2005-01-01','2000-01-01','2002-01-01','2010-01-01'],
                    'End Date': ['1998-12-31','2004-12-31','2020-04-22','2001-12-31','2009-12-31','2020-04-22']})

dfA = pd.DataFrame({'Initials':['GG','HH','II','JJ','KK','LL'],
                    'Zone': ['Red','Yellow','Red','Yellow','Red','Yellow'], 
                    'Date': ['2011-02-11','2004-11-03','2010- 04-12','2007-12-03','2008-12-01','2019-06-29']})

I am trying to add a new column to dfA that takes the Date from that dataframe and finds where, in dfB, it falls between the "Start" and "End" Dates columns. The zones must also match. It would then take the Initials from that row in dfB and place it in the new column in dfA.

I'm trying this code to do it:

dfB['Start Date'] = dfB['Start Date'].apply(lambda x: time.strptime(x, "%Y-%m-%d"))
dfB['End Date'] = dfB['End Date'].apply(lambda x: time.strptime(x, "%Y-%m-%d"))
dfA['Date'] = dfA['Date'].apply(lambda x: time.strptime(x, "%Y-%m-%d"))

for i,row in dfA.iterrows():
    a = row['Zone']
    dfC = dfB[dfB['Zone']==a]
    date = row['Date']
    dfA['New'] = dfC.loc[(dfC['Start Date'] < date) & (dfC['End Date'] > date),['Initials']]

I did the date.time function in order to make sure that my dates were all able to be compared. Not sure it was necessary, but felt it couldn't hurt.

I then iterrated over each row, made a filtered dataframe based on zone (the zones have to match), and then attempted to find the new initials based on where the dates fall.

The problem is that my results all come back as NaN. As you can see, the dates should match up. GG should match up with CC, HH should be EE, etc. Not sure what I am missing here?

Mayank Porwal · Accepted Answer

Something like this could work:

In [2247]: d = pd.merge(dfA, dfB, on='Zone')
In [2264]: r = d[d.Date.between(d['Start Date'], d['End Date'])] 
In [2279]: dfA['new_initials']  = pd.merge(dfA, r[['Initials_x','Initials_y']], left_on='Initials', right_on=['Initials_x'])['Initials_y']

In [2280]: dfA                                                                                                                                                                                              
Out[2280]: 
  Initials    Zone       Date new_initials
0       GG     Red 2011-02-11           CC
1       HH  Yellow 2004-11-03           EE
2       II     Red 2010-04-12           CC
3       JJ  Yellow 2007-12-03           EE
4       KK     Red 2008-12-01           CC
5       LL  Yellow 2019-06-29           FF

New Dataframe column based on conditions from different dataframe

Answers (1)

Related Questions