Immortalz
Immortalz

Reputation: 89

Pandas calculate the time difference between columns for when a condition is satisfied

I have a pandas df that has 2 columns Day (Date in datetime format) and the Number of breakdowns.

I want to create two new columns, the first one being Days from Previous Breakdown Event (which I have already done), and the second one being Days from Next Breakdown Event (which I struggle to do).

Day            Number of breakdowns    Days from Previous Breakdown Event
2017-01-09                   0.0                                   0                                             
2017-01-12                   0.0                                   0
2017-01-13                   0.0                                   0
2017-01-14                   0.0                                   0
2017-01-16                   1.0                                   0
2017-01-17                   0.0                                   1
2017-01-18                   0.0                                   2
2017-01-19                   1.0                                   0
2017-01-20                   0.0                                   1
2017-01-21                   0.0                                   2
2017-01-23                   1.0                                   0

The Days from Previous Breakdown Event calculates the number of days that passed since there was a breakdown.

Code:

s = df.groupby(df['Number of breakdowns'].ne(0).cumsum())['Day'].transform('first')
df['Days from Previous Breakdown Event'] = (df['Day'] - s).dt.days
zeros_index = df['Number of breakdowns'].ne(0).idxmax()
df.loc[:zeros_index,'Days from Previous Breakdown Event'] = 0

I need help doing the Days from Next breakdown column, which I want to look like this:

Day            Number of breakdowns    Days from Next Breakdown Event
2017-01-09                   0.0                                   7                                             
2017-01-12                   0.0                                   4
2017-01-13                   0.0                                   3
2017-01-14                   0.0                                   2
2017-01-16                   1.0                                   0
2017-01-17                   0.0                                   2
2017-01-18                   0.0                                   1
2017-01-19                   1.0                                   0
2017-01-20                   0.0                                   3
2017-01-21                   0.0                                   2
2017-01-23                   1.0                                   0

Upvotes: 1

Views: 45

Answers (1)

jezrael
jezrael

Reputation: 863611

Swap order by iloc[::-1] with transform and last and then also swap s - df['Day']:

s = df.groupby(df['Number of breakdowns'].iloc[::-1].ne(0).cumsum())['Day'].transform('last')
df['Days from Next Breakdown Event'] = (s - df['Day']).dt.days
print (df)
          Day  Number of breakdowns  Days from Previous Breakdown Event  \
0  2017-01-09                   0.0                                   0   
1  2017-01-12                   0.0                                   0   
2  2017-01-13                   0.0                                   0   
3  2017-01-14                   0.0                                   0   
4  2017-01-16                   1.0                                   0   
5  2017-01-17                   0.0                                   1   
6  2017-01-18                   0.0                                   2   
7  2017-01-19                   1.0                                   0   
8  2017-01-20                   0.0                                   1   
9  2017-01-21                   0.0                                   2   
10 2017-01-23                   1.0                                   0   

    Days from Next Breakdown Event  
0                                7  
1                                4  
2                                3  
3                                2  
4                                0  
5                                2  
6                                1  
7                                0  
8                                3  
9                                2  
10                               0  

Detail:

print (s)
0    2017-01-16
1    2017-01-16
2    2017-01-16
3    2017-01-16
4    2017-01-16
5    2017-01-19
6    2017-01-19
7    2017-01-19
8    2017-01-23
9    2017-01-23
10   2017-01-23
Name: Day, dtype: datetime64[ns]

Upvotes: 1

Related Questions