Rob
Rob

Reputation: 179

pandas add days before holiday

I need to add a column "next_holiday" computing days til next holiday. Below is the expected output once the column is added:

SalesDate holiday   next_holiday
2016-12-20  0             5
2016-12-21  0             4
2016-12-22  0             3
2016-12-23  0             2
2016-12-24  0             1
2016-12-25  1             0

Upvotes: 3

Views: 235

Answers (2)

Freek Wiekmeijer
Freek Wiekmeijer

Reputation: 4940

It is possible to avoid the groupby(), by using the index of the DataFrame.

1. Using RangeIndex

Assuming the following dataframe:

import pandas as pd

df = pd.DataFrame({
    'SalesDate': ['2016-12-20', '2016-12-21', '2016-12-22', '2016-12-23', '2016-12-24', '2016-12-25', '2016-12-26', '2016-12-27'],
    'holiday': [0, 0, 0, 0, 0, 1, 0, 0]
})

df

    SalesDate  holiday
0  2016-12-20        0
1  2016-12-21        0
2  2016-12-22        0
3  2016-12-23        0
4  2016-12-24        0
5  2016-12-25        1
6  2016-12-26        0
7  2016-12-27        0

One-line solution:

If the DataFrame uses a standard RangeIndex, you could do arithmetic using df.index.where().bfill() and df.index:

df['next_holiday'] = pd.Series(df.index.where(df.holiday == 1), dtype='Int32').fillna(method='bfill') - df.index  # Note: dtype'Int32' nullable int is new in pandas 0.24

Result:

df 
    SalesDate  holiday  next_holiday
0  2016-12-20        0             5
1  2016-12-21        0             4
2  2016-12-22        0             3
3  2016-12-23        0             2
4  2016-12-24        0             1
5  2016-12-25        1             0
6  2016-12-26        0           NaN
7  2016-12-27        0           NaN

2. Using DateTimeIndex

If SalesDate is the index column (datetime64 type), the solution is similar:

df = df.set_index(pd.to_datetime(df.SalesDate)).drop(columns=['SalesDate'])
df
            holiday
SalesDate          
2016-12-20        0
2016-12-21        0
2016-12-22        0
2016-12-23        0
2016-12-24        0
2016-12-25        1
2016-12-26        0
2016-12-27        0

Solution with date arithmetic:

df['next_holiday'] = ((pd.Series(df.index.where(df.holiday == 1)).fillna(method='bfill') - df.index) / np.timedelta64(1, 'D'))
df['next_holiday'] = df['next_holiday'].astype('Int32')  # pandas >= 0.24 for the nullable integer cast

Result:

df
            holiday  next_holiday
SalesDate                        
2016-12-20        0             5
2016-12-21        0             4
2016-12-22        0             3
2016-12-23        0             2
2016-12-24        0             1
2016-12-25        1             0
2016-12-26        0           NaN
2016-12-27        0           NaN

Upvotes: 1

bexi
bexi

Reputation: 1216

Here's a two-line suggestion:

temp = df.holiday.iloc[::-1].eq(1).cumsum()   
df['next_holiday'] = temp.groupby(temp).cumcount()

Upvotes: 4

Related Questions