Reputation: 94
Whish to have time duration/accumulation of time diff as long as "state" == 1 is active and else 'off'
timestamp state
2020-01-01 00:00:00 0
2020-01-01 00:00:01 0
2020-01-01 00:00:02 0
2020-01-01 00:00:03 1
2020-01-01 00:00:04 1
2020-01-01 00:00:05 1
2020-01-01 00:00:06 1
2020-01-01 00:00:07 0
2020-01-01 00:00:08 0
2020-01-01 00:00:09 0
2020-01-01 00:00:10 0
2020-01-01 00:00:11 1
2020-01-01 00:00:12 1
2020-01-01 00:00:13 1
2020-01-01 00:00:14 1
2020-01-01 00:00:15 1
2020-01-01 00:00:16 1
2020-01-01 00:00:17 0
2020-01-01 00:00:18 0
2020-01-01 00:00:19 0
2020-01-01 00:00:20 0
Based on a similar question, I tried something with groupby, however, the code ignores to stop doing timediff when "state" == 0.
I also tried to apply a lambda function (commented) but an error pops up sayin "KeyError: ('state', 'occurred at index timestamp')"
Any idea how to do that better ?
import numpy as np
import pandas as pd
dt = pd.date_range('2020-01-01', '2020-01-01 00:00:20',freq='1s')
s = [0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,1,1,0,0,0,0]
df = pd.DataFrame({'timestamp': dt,
'state': s})
df['timestamp']=pd.to_datetime(df.timestamp, format='%Y-%m-%d %H:%M:%S')
df['tdiff']=(df.groupby('state').diff().timestamp.values/60)
#df['tdiff'] = df.apply(lambda x: x['timestamp'].diff().state.values/60 if x['state'] == 1 else 'off')
The desired output should be:
timestamp state tdiff accum.
2020-01-01 00:00:00 0 off 0
2020-01-01 00:00:01 0 off 0
2020-01-01 00:00:02 0 off 0
2020-01-01 00:00:03 1 nan 0
2020-01-01 00:00:04 1 1.0 1.0
2020-01-01 00:00:05 1 1.0 2.0
2020-01-01 00:00:06 1 1.0 3.0
2020-01-01 00:00:07 0 off 0
2020-01-01 00:00:08 0 off 0
2020-01-01 00:00:09 0 off 0
2020-01-01 00:00:10 0 off 0
2020-01-01 00:00:11 1 nan 0
2020-01-01 00:00:12 1 1.0 1.0
2020-01-01 00:00:13 1 1.0 2.0
2020-01-01 00:00:14 1 1.0 3.0
2020-01-01 00:00:15 1 1.0 4.0
2020-01-01 00:00:16 1 1.0 5.0
Upvotes: 2
Views: 77
Reputation: 765
def function1(dd:pd.DataFrame):
if dd.pipe(len)<2:
return dd.assign(tdiff='off',accum=0)
else:
dd1=dd.assign(tdiff=1,accum=range(0,dd.pipe(len)))
dd1.loc[dd.index.min(),'tdiff']=pd.NA
return dd1
col1=df1.state.ne(1).cumsum()
df1.assign(col1=col1).groupby(['state',col1],as_index=False).apply(function1)
:
timestamp state tdiff accum
0 2020-01-01 00:00:00 0 off NaN
1 2020-01-01 00:00:01 0 off NaN
2 2020-01-01 00:00:02 0 off NaN
3 2020-01-01 00:00:03 1 NaN NaN
4 2020-01-01 00:00:04 1 1 1.0
5 2020-01-01 00:00:05 1 1 2.0
6 2020-01-01 00:00:06 1 1 3.0
7 2020-01-01 00:00:07 0 off NaN
8 2020-01-01 00:00:08 0 off NaN
9 2020-01-01 00:00:09 0 off NaN
10 2020-01-01 00:00:10 0 off NaN
11 2020-01-01 00:00:11 1 NaN NaN
12 2020-01-01 00:00:12 1 1 1.0
13 2020-01-01 00:00:13 1 1 2.0
14 2020-01-01 00:00:14 1 1 3.0
15 2020-01-01 00:00:15 1 1 4.0
16 2020-01-01 00:00:16 1 1 5.0
17 2020-01-01 00:00:17 0 off NaN
18 2020-01-01 00:00:18 0 off NaN
19 2020-01-01 00:00:19 0 off NaN
20 2020-01-01 00:00:20 0 off NaN
Upvotes: 0
Reputation: 323286
You can check with groupby
with cumsum
for the additional groupkey
g = df.loc[df['state'].ne(0)].groupby(df['state'].eq(0).cumsum())['timestamp']
s1 = g.diff().dt.total_seconds()
s2 = g.apply(lambda x : x.diff().dt.total_seconds().cumsum())
df['tdiff'] = 'off'
df.loc[df['state'].ne(0),'tdiff'] = s1
df['accum'] = s2
# notice I did not fillna with 0, you can do it with df['accum'].fillna(0,inplace=True)
df
Out[53]:
timestamp state tdiff accum
0 2020-01-01 00:00:00 0 off NaN
1 2020-01-01 00:00:01 0 off NaN
2 2020-01-01 00:00:02 0 off NaN
3 2020-01-01 00:00:03 1 NaN NaN
4 2020-01-01 00:00:04 1 1 1.0
5 2020-01-01 00:00:05 1 1 2.0
6 2020-01-01 00:00:06 1 1 3.0
7 2020-01-01 00:00:07 0 off NaN
8 2020-01-01 00:00:08 0 off NaN
9 2020-01-01 00:00:09 0 off NaN
10 2020-01-01 00:00:10 0 off NaN
11 2020-01-01 00:00:11 1 NaN NaN
12 2020-01-01 00:00:12 1 1 1.0
13 2020-01-01 00:00:13 1 1 2.0
14 2020-01-01 00:00:14 1 1 3.0
15 2020-01-01 00:00:15 1 1 4.0
16 2020-01-01 00:00:16 1 1 5.0
17 2020-01-01 00:00:17 0 off NaN
18 2020-01-01 00:00:18 0 off NaN
19 2020-01-01 00:00:19 0 off NaN
20 2020-01-01 00:00:20 0 off NaN
Upvotes: 2