Reputation: 37
Given the following pandas data frame.
Firm Date Work_day
0 1 2016-11-17 1.0
1 1 2016-11-18 1.0
2 1 2016-11-19 0.0
3 1 2016-11-20 0.0
4 1 2016-11-21 1.0
5 1 2016-11-22 1.0
6 2 2016-12-21 0.0
7 2 2016-12-22 1.0
8 2 2016-12-23 0.0
9 2 2016-12-24 0.0
How would I be able to add the following column "Last_working_day", which either is simply the current date (if Work_day == 1
) or gives the date of the last obs. of the respective firm that has a "Work_day" value of 1.0
(Work_day is a dummy variables for working days here). So the resulting table would look like this.
Firm Date Work_day Last_working_day
0 1 2016-11-17 1.0 2016-11-17
1 1 2016-11-18 1.0 2016-11-18
2 1 2016-11-19 0.0 2016-11-18
3 1 2016-11-20 0.0 2016-11-18
4 1 2016-11-21 1.0 2016-11-21
5 1 2016-11-22 1.0 2016-11-22
6 2 2016-12-21 0.0 NaN
7 2 2016-12-22 1.0 2016-12-22
8 2 2016-12-23 0.0 2016-12-22
9 2 2016-12-24 0.0 2016-12-22
Many thanks in Advance!
Upvotes: 0
Views: 153
Reputation: 862921
Replace datetimes by mssing values if 0
in Work_day
with Series.mask
and last call GroupBy.ffill
:
#if necessary doering per groups
#df = df.sort_values(['Firm','Date'])
df['Last_working_day'] = df['Date'].mask(df['Work_day'].eq(0)).groupby(df['Firm']).ffill()
print (df)
Firm Date Work_day Last_working_day
0 1 2016-11-17 1.0 2016-11-17
1 1 2016-11-18 1.0 2016-11-18
2 1 2016-11-19 0.0 2016-11-18
3 1 2016-11-20 0.0 2016-11-18
4 1 2016-11-21 1.0 2016-11-21
5 1 2016-11-22 1.0 2016-11-22
6 2 2016-12-21 0.0 NaN
7 2 2016-12-22 1.0 2016-12-22
8 2 2016-12-23 0.0 2016-12-22
9 2 2016-12-24 0.0 2016-12-22
Upvotes: 0
Reputation: 42916
We use Series.where
to convert all dates to NaN
where Work_day = 0
.
Then we use GroupBy.ffill
to fill the gaps per firm.
df['Last_working_day'] = df['Date'].where(df['Work_day'].eq(1))
df['Last_working_day'] = df.groupby('Firm')['Last_working_day'].ffill()
Firm Date Work_day Last_working_day
0 1 2016-11-17 1.0 2016-11-17
1 1 2016-11-18 1.0 2016-11-18
2 1 2016-11-19 0.0 2016-11-18
3 1 2016-11-20 0.0 2016-11-18
4 1 2016-11-21 1.0 2016-11-21
5 1 2016-11-22 1.0 2016-11-22
6 2 2016-12-21 0.0 NaN
7 2 2016-12-22 1.0 2016-12-22
8 2 2016-12-23 0.0 2016-12-22
9 2 2016-12-24 0.0 2016-12-22
Note that Date
has to be sorted ascending (which it is in this case), else sort beforehand:
df = df.sort_values('Date')
Upvotes: 2