Reputation: 179
I need to add a column "next_holiday" computing days til next holiday. Below is the expected output once the column is added:
SalesDate holiday next_holiday
2016-12-20 0 5
2016-12-21 0 4
2016-12-22 0 3
2016-12-23 0 2
2016-12-24 0 1
2016-12-25 1 0
Upvotes: 3
Views: 235
Reputation: 4940
It is possible to avoid the groupby()
, by using the index of the DataFrame.
1. Using RangeIndex
Assuming the following dataframe:
import pandas as pd
df = pd.DataFrame({
'SalesDate': ['2016-12-20', '2016-12-21', '2016-12-22', '2016-12-23', '2016-12-24', '2016-12-25', '2016-12-26', '2016-12-27'],
'holiday': [0, 0, 0, 0, 0, 1, 0, 0]
})
df
SalesDate holiday
0 2016-12-20 0
1 2016-12-21 0
2 2016-12-22 0
3 2016-12-23 0
4 2016-12-24 0
5 2016-12-25 1
6 2016-12-26 0
7 2016-12-27 0
One-line solution:
If the DataFrame uses a standard RangeIndex, you could do arithmetic using df.index.where().bfill()
and df.index
:
df['next_holiday'] = pd.Series(df.index.where(df.holiday == 1), dtype='Int32').fillna(method='bfill') - df.index # Note: dtype'Int32' nullable int is new in pandas 0.24
Result:
df
SalesDate holiday next_holiday
0 2016-12-20 0 5
1 2016-12-21 0 4
2 2016-12-22 0 3
3 2016-12-23 0 2
4 2016-12-24 0 1
5 2016-12-25 1 0
6 2016-12-26 0 NaN
7 2016-12-27 0 NaN
2. Using DateTimeIndex
If SalesDate
is the index column (datetime64
type), the solution is similar:
df = df.set_index(pd.to_datetime(df.SalesDate)).drop(columns=['SalesDate'])
df
holiday
SalesDate
2016-12-20 0
2016-12-21 0
2016-12-22 0
2016-12-23 0
2016-12-24 0
2016-12-25 1
2016-12-26 0
2016-12-27 0
Solution with date arithmetic:
df['next_holiday'] = ((pd.Series(df.index.where(df.holiday == 1)).fillna(method='bfill') - df.index) / np.timedelta64(1, 'D'))
df['next_holiday'] = df['next_holiday'].astype('Int32') # pandas >= 0.24 for the nullable integer cast
Result:
df
holiday next_holiday
SalesDate
2016-12-20 0 5
2016-12-21 0 4
2016-12-22 0 3
2016-12-23 0 2
2016-12-24 0 1
2016-12-25 1 0
2016-12-26 0 NaN
2016-12-27 0 NaN
Upvotes: 1
Reputation: 1216
Here's a two-line suggestion:
temp = df.holiday.iloc[::-1].eq(1).cumsum()
df['next_holiday'] = temp.groupby(temp).cumcount()
Upvotes: 4