How to calculate the number of days since last event==1?

Question

I am doing a classification problem in which I am trying to predict whether a car will be refuelled the following day.

The data consists of a date, an ID for every car, and a dummy variable indicating whether the car was refuelled on that specific date.

What I want to achieve is the column "days_since_refuelled". This should be calculated as the number of days since that last occurrence of refuelled==1, and has to be calculated individually for every car_id, obviously. If there has been no previous instance of refuelled==1, then the value should be equal to -1.

The desired output should look like this:

date    car_id     refuelled    days_since_refuelled

01-01-2019    1    0    -1

01-01-2019    2    1    -1

01-01-2019    3    1    -1

06-01-2019    1    0    -1

06-01-2019    2    0    5

06-01-2019    3    0    5

09-01-2019    1    1    -1

09-01-2019    2    0    8

09-01-2019    3    0    8

14-01-2019    1    0    5

14-01-2019    2    1    13

14-01-2019    3    0    13

jezrael · Accepted Answer

Convert rows with 1 to NaNs by Series.where, then Series.shift with ffill per groups, then subtract date column by Series.sub, last convert timedeltas by Series.dt.days and replace missing values by Series.fillna:

#convert to datetimes
df['date'] = pd.to_datetime(df['date'], dayfirst=True)

df['days_since_refuelled'] = df['date'].where(df['refuelled'].eq(1))
df['days_since_refuelled'] = (df['date'].sub(df.groupby('car_id')['days_since_refuelled']
                                        .apply(lambda x: x.shift().ffill())
                                        )
                                        .dt.days
                                        .fillna(-1)
                                        .astype(int))
print (df)
         date  car_id  refuelled  days_since_refulled  days_since_refuelled
0  2009-01-01       1          0                   -1                    -1
1  2019-01-01       2          1                   -1                    -1
2  2019-01-01       3          1                   -1                    -1
3  2019-01-06       1          0                   -1                    -1
4  2019-01-06       2          0                    5                     5
5  2019-01-06       3          0                    5                     5
6  2019-01-09       1          1                   -1                    -1
7  2019-01-09       2          0                    8                     8
8  2019-01-09       3          0                    8                     8
9  2019-01-14       1          0                    5                     5
10 2019-01-14       2          1                   13                    13
11 2019-01-14       3          0                   13                    13

How to calculate the number of days since last event==1?

Answers (1)

Related Questions