Pei Li
Pei Li

Reputation: 320

pandas cumsum based on the condition of column

I have a data frame that records the speed of a car. 'id' is its car id. The data frame looks like this:

df = pd.DataFrame({'id':[1,1,1,1,1,1,1,1,1,1],
                   'speed':[10,0,0,20,20,15,0,0,0,10],
                   'time':['2020-01-17 18:43:29',
                             '2020-01-17 18:43:48',
                             '2020-01-17 18:44:09',
                             '2020-01-17 18:44:28',
                             '2020-01-17 18:44:48',
                             '2020-01-17 18:46:05',
                             '2020-01-17 18:47:15',
                             '2020-01-17 18:47:24',
                             '2020-01-17 18:53:07',
                             '2020-01-17 18:58:36']})
df['time']=pd.to_datetime(df['time'])

I want to estimate stop time (speed=0). So I did this first:

df['time_diff']=(df['time'].shift(-1)-df['time']).dt.seconds

Now I want to cumsum the column 'time_diff' when 'speed=0'. The results should look like this:

[0, 40, 40, 0, 0, 0, 681, 681, 681, 0]

The key idea for this problem is that we need to cumsum for continuous 'speed=0'. I did check some similar answers but could not find a good solution.

Upvotes: 1

Views: 96

Answers (1)

anky
anky

Reputation: 75080

IIUC, try:

c = df['speed'].eq(0) #condition
#calculation as per your question
s = (df['time'].shift(-1)-df['time']).dt.seconds
#check if series is immediate duplicate and groupby and sum 
#then replace with 0 where c isn't met
s.groupby((c.ne(c.shift()).cumsum())).transform('sum').where(c,0)#.astype(int).tolist()

0      0
1     40
2     40
3      0
4      0
5      0
6    681
7    681
8    681
9      0
Name: time, dtype: int32

Upvotes: 2

Related Questions