thxclown
thxclown

Reputation: 37

Getting the date of the last observation that satisfies a certain condition

Given the following pandas data frame.

    Firm     Date         Work_day
0   1        2016-11-17   1.0      
1   1        2016-11-18   1.0      
2   1        2016-11-19   0.0      
3   1        2016-11-20   0.0     
4   1        2016-11-21   1.0      
5   1        2016-11-22   1.0      
6   2        2016-12-21   0.0      
7   2        2016-12-22   1.0      
8   2        2016-12-23   0.0      
9   2        2016-12-24   0.0     

How would I be able to add the following column "Last_working_day", which either is simply the current date (if Work_day == 1) or gives the date of the last obs. of the respective firm that has a "Work_day" value of 1.0 (Work_day is a dummy variables for working days here). So the resulting table would look like this.

    Firm     Date         Work_day Last_working_day
0   1        2016-11-17   1.0      2016-11-17
1   1        2016-11-18   1.0      2016-11-18
2   1        2016-11-19   0.0      2016-11-18
3   1        2016-11-20   0.0      2016-11-18
4   1        2016-11-21   1.0      2016-11-21
5   1        2016-11-22   1.0      2016-11-22
6   2        2016-12-21   0.0      NaN
7   2        2016-12-22   1.0      2016-12-22
8   2        2016-12-23   0.0      2016-12-22
9   2        2016-12-24   0.0      2016-12-22

Many thanks in Advance!

Upvotes: 0

Views: 153

Answers (2)

jezrael
jezrael

Reputation: 862921

Replace datetimes by mssing values if 0 in Work_day with Series.mask and last call GroupBy.ffill:

#if necessary doering per groups
#df = df.sort_values(['Firm','Date'])

df['Last_working_day'] = df['Date'].mask(df['Work_day'].eq(0)).groupby(df['Firm']).ffill()
print (df)
   Firm        Date  Work_day Last_working_day
0     1  2016-11-17       1.0       2016-11-17
1     1  2016-11-18       1.0       2016-11-18
2     1  2016-11-19       0.0       2016-11-18
3     1  2016-11-20       0.0       2016-11-18
4     1  2016-11-21       1.0       2016-11-21
5     1  2016-11-22       1.0       2016-11-22
6     2  2016-12-21       0.0              NaN
7     2  2016-12-22       1.0       2016-12-22
8     2  2016-12-23       0.0       2016-12-22
9     2  2016-12-24       0.0       2016-12-22

Upvotes: 0

Erfan
Erfan

Reputation: 42916

We use Series.where to convert all dates to NaN where Work_day = 0.

Then we use GroupBy.ffill to fill the gaps per firm.

df['Last_working_day'] = df['Date'].where(df['Work_day'].eq(1))
df['Last_working_day'] = df.groupby('Firm')['Last_working_day'].ffill()

   Firm        Date  Work_day Last_working_day
0     1  2016-11-17       1.0       2016-11-17
1     1  2016-11-18       1.0       2016-11-18
2     1  2016-11-19       0.0       2016-11-18
3     1  2016-11-20       0.0       2016-11-18
4     1  2016-11-21       1.0       2016-11-21
5     1  2016-11-22       1.0       2016-11-22
6     2  2016-12-21       0.0              NaN
7     2  2016-12-22       1.0       2016-12-22
8     2  2016-12-23       0.0       2016-12-22
9     2  2016-12-24       0.0       2016-12-22

Note that Date has to be sorted ascending (which it is in this case), else sort beforehand:

df = df.sort_values('Date')

Upvotes: 2

Related Questions