Carlo Allocca
Carlo Allocca

Reputation: 649

Adding extra column as the cumulative time difference: a complex case

Please, I would like to add an extra column as the cumulative time difference (the ts_B aatribute) as exemplified by the following dataset:

df1 input:

id_B, ts_B,course,weight,Phase,remainingTime,progressPercentage
id1,2017-04-27 01:35:30,cotton,3.5,A,03:15:00,23.0
id1,2017-04-27 01:36:00,cotton,3.5,A,03:14:00,23.0  
id1,2017-04-27 01:36:30,cotton,3.5,A,03:14:00,24.0 
id1,2017-04-27 01:37:00,cotton,3.5,B,03:13:00,24.0
id1,2017-04-27 01:37:30,cotton,3.5,B,03:13:00,24.0
id1,2017-04-27 01:38:00,cotton,3.5,B,03:13:00,24.0
id1,2017-04-27 01:38:30,cotton,3.5,C,03:13:00,24.0
id1,2017-04-27 01:39:00,cotton,3.5,C,00:02:00,99.0
id1,2017-04-27 01:39:30,cotton,3.5,C,00:01:00,100.0
id1,2017-04-27 01:40:00,cotton,3.5,Finish,00:01:00,100.0
id1,2017-04-27 02:35:30,cotton,3.5,A,03:15:00,1.0
id1,2017-04-27 02:36:00,cotton,3.5,A,03:14:00,2.0  
id1,2017-04-27 02:36:30,cotton,3.5,A,03:14:00,2.0 
id1,2017-04-27 02:37:00,cotton,3.5,B,03:13:00,3.0
id1,2017-04-27 02:37:30,cotton,3.5,B,03:13:00,4.0
id1,2017-04-27 02:38:00,cotton,3.5,B,03:13:00,5.0
id1,2017-04-27 02:38:30,cotton,3.5,C,03:13:00,98.0
id1,2017-04-27 02:39:00,cotton,3.5,C,00:02:00,99.0
id1,2017-04-27 02:39:30,cotton,3.5,C,00:01:00,100.0
id1,2017-04-27 02:40:00,cotton,3.5,Finish,00:01:00,100.0
id2,2017-04-27 03:36:00,cotton,3.5,A,03:14:00,1.0
id2,2017-04-27 03:36:30,cotton,3.5,A,03:14:00,1.0 
id2,2017-04-27 03:37:00,cotton,3.5,B,03:13:00,2.0
id2,2017-04-27 03:37:30,cotton,3.5,B,03:13:00,2.0
id2,2017-04-27 03:38:00,cotton,3.5,B,03:13:00,3.0
id2,2017-04-27 03:38:30,cotton,3.5,C,03:13:00,98.0
id2,2017-04-27 03:39:00,cotton,3.5,C,00:02:00,99.0
id2,2017-04-27 03:39:30,cotton,3.5,C,00:01:00,100.0
id2,2017-04-27 03:40:00,cotton,3.5,Finish,00:01:00,100.0

df output:

 id_B,ts_B,course,weight,Phase,remainingTime,progressPercentage,cum_delta_sec
 id1,2017-04-27 01:35:30,cotton,3.5,A,03:15:00,23.0,                0
 id1,2017-04-27 01:36:00,cotton,3.5,A,03:14:00,23.0,                30  
 id1,2017-04-27 01:36:30,cotton,3.5,A,03:14:00,24.0,                60 
 id1,2017-04-27 01:37:00,cotton,3.5,B,03:13:00,24.0,                90
 id1,2017-04-27 01:37:30,cotton,3.5,B,03:13:00,24.0,                120
 id1,2017-04-27 01:38:00,cotton,3.5,B,03:13:00,24.0,                150
 id1,2017-04-27 01:38:30,cotton,3.5,C,03:13:00,24.0,                180
 id1,2017-04-27 01:39:00,cotton,3.5,C,00:02:00,99.0,                210
 id1,2017-04-27 01:39:30,cotton,3.5,C,00:01:00,100.0,               240
 id1,2017-04-27 01:40:00,cotton,3.5,Finish,00:01:00,100.0,          270
 id1,2017-04-27 02:35:30,cotton,3.5,A,03:15:00,1.0,                 0
 id1,2017-04-27 02:36:00,cotton,3.5,A,03:14:00,2.0,                 30  
 id1,2017-04-27 02:36:30,cotton,3.5,A,03:14:00,2.0,                 60 
 id1,2017-04-27 02:37:00,cotton,3.5,B,03:13:00,3.0,                 90
 id1,2017-04-27 02:37:30,cotton,3.5,B,03:13:00,4.0,                 120
 id1,2017-04-27 02:38:00,cotton,3.5,B,03:13:00,5.0,                 150
 id1,2017-04-27 02:38:30,cotton,3.5,C,03:13:00,98.0,                180
 id1,2017-04-27 02:39:00,cotton,3.5,C,00:02:00,99.0,                210
 id1,2017-04-27 02:39:30,cotton,3.5,C,00:01:00,100.0,               240
 id1,2017-04-27 02:40:00,cotton,3.5,Finish,00:01:00,100.0,          270
 id2,2017-04-27 03:36:00,cotton,3.5,A,03:14:00,1.0,                 0  
 id2,2017-04-27 03:36:30,cotton,3.5,A,03:14:00,1.0,                 30 
 id2,2017-04-27 03:37:00,cotton,3.5,B,03:13:00,2.0,                 60
 id2,2017-04-27 03:37:30,cotton,3.5,B,03:13:00,2.0,                 90
 id2,2017-04-27 03:38:00,cotton,3.5,B,03:13:00,3.0,                 120
 id2,2017-04-27 03:38:30,cotton,3.5,C,03:13:00,98.0,                150
 id2,2017-04-27 03:39:00,cotton,3.5,C,00:02:00,99.0,                180
 id2,2017-04-27 03:39:30,cotton,3.5,C,00:01:00,100.0,               210
 id2,2017-04-27 03:40:00,cotton,3.5,Finish,00:01:00,100.0,          240

The solution provided at Add extra column as the cumulative time difference is a bit different be applied as the key to identify the full experiment is not based on the id only. I need to plug in the the id and the fact that for each id, the "progressPercentage" goes from x>0 to 100.

Please, any help on this? Thanks you in advance for your support. Best Regards, Carlo

Upvotes: 1

Views: 47

Answers (1)

jezrael
jezrael

Reputation: 862541

You can use shift with comapre somee edge value like Finish in Phase column with cumsum as new Series for grouping:

df.ts_B = pd.to_datetime(df.ts_B)
a = df['Phase'].shift().eq('Finish').cumsum()

df['cum_delta_sec'] = df.ts_B.astype(np.int64).div(10**9)
                       .groupby([df.id_B,a])
                       .transform(lambda x: x.diff().fillna(0).cumsum())
print (df)

   id_B                ts_B  course  weight   Phase remainingTime  \
0   id1 2017-04-27 01:35:30  cotton     3.5       A      03:15:00   
1   id1 2017-04-27 01:36:00  cotton     3.5       A      03:14:00   
2   id1 2017-04-27 01:36:30  cotton     3.5       A      03:14:00   
3   id1 2017-04-27 01:37:00  cotton     3.5       B      03:13:00   
4   id1 2017-04-27 01:37:30  cotton     3.5       B      03:13:00   
5   id1 2017-04-27 01:38:00  cotton     3.5       B      03:13:00   
6   id1 2017-04-27 01:38:30  cotton     3.5       C      03:13:00   
7   id1 2017-04-27 01:39:00  cotton     3.5       C      00:02:00   
8   id1 2017-04-27 01:39:30  cotton     3.5       C      00:01:00   
9   id1 2017-04-27 01:40:00  cotton     3.5  Finish      00:01:00   
10  id1 2017-04-27 02:35:30  cotton     3.5       A      03:15:00   
11  id1 2017-04-27 02:36:00  cotton     3.5       A      03:14:00   
12  id1 2017-04-27 02:36:30  cotton     3.5       A      03:14:00   
13  id1 2017-04-27 02:37:00  cotton     3.5       B      03:13:00   
14  id1 2017-04-27 02:37:30  cotton     3.5       B      03:13:00   
15  id1 2017-04-27 02:38:00  cotton     3.5       B      03:13:00   
16  id1 2017-04-27 02:38:30  cotton     3.5       C      03:13:00   
17  id1 2017-04-27 02:39:00  cotton     3.5       C      00:02:00   
18  id1 2017-04-27 02:39:30  cotton     3.5       C      00:01:00   
19  id1 2017-04-27 02:40:00  cotton     3.5  Finish      00:01:00   
20  id2 2017-04-27 03:36:00  cotton     3.5       A      03:14:00   
21  id2 2017-04-27 03:36:30  cotton     3.5       A      03:14:00   
22  id2 2017-04-27 03:37:00  cotton     3.5       B      03:13:00   
23  id2 2017-04-27 03:37:30  cotton     3.5       B      03:13:00   
24  id2 2017-04-27 03:38:00  cotton     3.5       B      03:13:00   
25  id2 2017-04-27 03:38:30  cotton     3.5       C      03:13:00   
26  id2 2017-04-27 03:39:00  cotton     3.5       C      00:02:00   
27  id2 2017-04-27 03:39:30  cotton     3.5       C      00:01:00   
28  id2 2017-04-27 03:40:00  cotton     3.5  Finish      00:01:00   

    progressPercentage  cum_delta_sec  
0                 23.0            0.0  
1                 23.0           30.0  
2                 24.0           60.0  
3                 24.0           90.0  
4                 24.0          120.0  
5                 24.0          150.0  
6                 24.0          180.0  
7                 99.0          210.0  
8                100.0          240.0  
9                100.0          270.0  
10                 1.0            0.0  
11                 2.0           30.0  
12                 2.0           60.0  
13                 3.0           90.0  
14                 4.0          120.0  
15                 5.0          150.0  
16                98.0          180.0  
17                99.0          210.0  
18               100.0          240.0  
19               100.0          270.0  
20                 1.0            0.0  
21                 1.0           30.0  
22                 2.0           60.0  
23                 2.0           90.0  
24                 3.0          120.0  
25                98.0          150.0  
26                99.0          180.0  
27               100.0          210.0  
28               100.0          240.0  

Upvotes: 1

Related Questions