Reputation: 645
I would like to calculate the duration of which a column is True with a index containing seconds within a dataframe.
I have a dataframe:
df = pd.DataFrame({'a': {0.0: False, 0.5: False, 1.0:False, 1.5:True, 2.0:True, 4.0:False, 8.0:True,10.0:False}})
>>> df
a
0.0 False
0.5 False
1.0 False
1.5 True
2.0 True
4.0 False
8.0 True
10.0 False
The index of the dataframe contains seconds. I think the most appropriate method would be to extract the first occurance of each sections of True values, until a false is found and have a running total. RT is the running total The result would be something like this:
>>> df
a
0.0 False
0.5 False
1.0 False
1.5 True < - found first occurance (1.5)
2.0 True
4.0 False < - found false (4.0) (running total: 4-1.5 + RT = 2.5)
8.0 True < - found first occurance (8.0)
10.0 False < - found false (10.0) (running total: 10-8 + RT = 4.5)
RT = 4.5 seconds
Upvotes: 1
Views: 97
Reputation: 59519
Create a helper columns that uniquely label the False
values -- these numbers indicate the possible ends of a True
streak. Then merge the False rows with the first True row that belongs to the previous streak and calculate the sum of the time difference.
df = df.rename_axis(index='time').reset_index()
df['end'] = (~df['a']).cumsum()
# time a end
#0 0.0 False 1
#1 0.5 False 2
#2 1.0 False 3
#3 1.5 True 3
#4 2.0 True 3
#5 4.0 False 4
#6 8.0 True 4
#7 10.0 False 5
res = pd.merge(df[df['a'].eq(False)],
df[df['a'].eq(True)].assign(end=df['end']+1).drop_duplicates('end'),
on='end')
# time_x a_x end time_y a_y
#0 4.0 False 4 1.5 True
#1 10.0 False 5 8.0 True
(res['time_x'] - res['time_y']).sum()
#4.5
To explain the merge, the left DataFrame is just every row in the original that contains False:
df[df['a'].eq(False)]
# time a end
#0 0.0 False 1
#1 0.5 False 2
#2 1.0 False 3
#5 4.0 False 4
#7 10.0 False 5
The right DataFrame is a little more complicated. I take only the True rows, but drop_duplicates on end
because we only want to keep the first True when there are consecutive True values. Finally because we want to match the True with the False that comes after it, we need to add one to end
.
df[df['a'].eq(True)].assign(end=df['end']+1).drop_duplicates('end')
# time a end
#3 1.5 True 4
#6 8.0 True 5
Now by merging on='end'
we are able to match the first True (in a possible consecutive group of True values) with the first False value that follows it. Since we brought the index along as 'time'
we can calculate the time difference between these observations.
Upvotes: 1