RMRiver
RMRiver

Reputation: 645

Calculate duration of True values in column with an index containing seconds

I would like to calculate the duration of which a column is True with a index containing seconds within a dataframe.

I have a dataframe:

df = pd.DataFrame({'a': {0.0: False, 0.5: False, 1.0:False, 1.5:True, 2.0:True, 4.0:False, 8.0:True,10.0:False}})

>>> df
          a
0.0   False
0.5   False
1.0   False
1.5    True
2.0    True
4.0   False
8.0    True
10.0  False

The index of the dataframe contains seconds. I think the most appropriate method would be to extract the first occurance of each sections of True values, until a false is found and have a running total. RT is the running total The result would be something like this:

    >>> df
          a
0.0   False
0.5   False
1.0   False
1.5    True < - found first occurance  (1.5)
2.0    True
4.0   False < - found false (4.0) (running total: 4-1.5 + RT = 2.5)
8.0    True < - found first occurance (8.0)
10.0  False < - found false (10.0) (running total: 10-8 + RT = 4.5)

RT = 4.5 seconds

Upvotes: 1

Views: 97

Answers (1)

ALollz
ALollz

Reputation: 59519

Create a helper columns that uniquely label the False values -- these numbers indicate the possible ends of a True streak. Then merge the False rows with the first True row that belongs to the previous streak and calculate the sum of the time difference.

df = df.rename_axis(index='time').reset_index()
df['end'] = (~df['a']).cumsum()
#   time      a  end
#0   0.0  False    1
#1   0.5  False    2
#2   1.0  False    3
#3   1.5   True    3
#4   2.0   True    3
#5   4.0  False    4
#6   8.0   True    4
#7  10.0  False    5

res = pd.merge(df[df['a'].eq(False)],
               df[df['a'].eq(True)].assign(end=df['end']+1).drop_duplicates('end'),
               on='end')
#   time_x    a_x  end  time_y   a_y
#0     4.0  False    4     1.5  True
#1    10.0  False    5     8.0  True

(res['time_x'] - res['time_y']).sum()
#4.5

To explain the merge, the left DataFrame is just every row in the original that contains False:

df[df['a'].eq(False)]
#   time      a  end
#0   0.0  False    1
#1   0.5  False    2
#2   1.0  False    3
#5   4.0  False    4
#7  10.0  False    5

The right DataFrame is a little more complicated. I take only the True rows, but drop_duplicates on end because we only want to keep the first True when there are consecutive True values. Finally because we want to match the True with the False that comes after it, we need to add one to end.

df[df['a'].eq(True)].assign(end=df['end']+1).drop_duplicates('end')
#   time     a  end
#3   1.5  True    4
#6   8.0  True    5

Now by merging on='end' we are able to match the first True (in a possible consecutive group of True values) with the first False value that follows it. Since we brought the index along as 'time' we can calculate the time difference between these observations.

Upvotes: 1

Related Questions