Computing rate of occurrences per unit of time in a pandas dataframe

Question

I am seeking to compute the rate of an event occurrence in a dataframe.

I have a dataframe in pandas that tracks the onset and duration of a certain event. So initially it will look something along the lines:

   onset  duration label channels  end_time
0    1.5       0.1   HFO       A1  10
1    2.0       1.0   HFO       A2  10
2    3.0       1.0   HFO       A3  10
3    5.5       0.1   HFO       A1  10

where onset and duration and end_time are in seconds. channels denote a unique set of groups that I want to loop over.

Desired Output

I would like to get something like this:

rate_dict = {
  'A1': 0.2,  # rate of 0.2 per second (i.e. 2 occurrences over 10 second time frame)
  'A2': 0.1,  # rate of 0.1 per second
  'A3': 0.1
}

My attempt so far

First, I obtain a group based on channels:

for idx, group in df.groupby(['channels']):

Then I convert things to a datetime index

                             onset  duration label channels  end_time
timestamp                                                               
2021-02-10 19:25:19.391130+00:00    1.5       0.1   HFO       A1  10
2021-02-10 19:25:23.391130+00:00    5.5       0.1   HFO       A1  10

Next, I considered re-indexing over the start (0 second) and end time (10 seconds in this case):

# rate is say 's' for creating a dummy row for every second
dt_idx = pd.date_range(ref_timestamp, end_timestamp, freq=rate)
group = group.reindex(dt_idx, fill_value=np.nan)

The issue is that it does not pick up the events that occur at 1.5 and 5.5 seconds for channel A1. So I end up getting essentially rows of all nans, whereas ideally I get a count of 2 in this period that I resampled.

Desired Generalization

Ideally, I can specify other rate strings (e.g. 'hr') and it would return the rate per hr. In this case that would be:

rate_dict = {
      'A1': 2.0,  # rate of 2 per hr (i.e. 2 occurrences over a 1 hour time frame)
      'A2': 1.0,  # rate of 1 per hr
      'A3': 1.0
    }

Arthur D. · Accepted Answer

First, we can recreate your table as a Pandas DataFrame:

import pandas as pd
d = {'onset': [1.5 ,2.0 ,3.0 ,5.5], 
     'duration': [0.1, 1.0, 1.0, 0.1],
     'label': ['HFO', 'HFO', 'HFO', 'HFO'],
     'channels': ['A1', 'A2', 'A3', 'A1'],
     'end_time': [10.0, 10.0, 10.0, 10.0]}

df = pd.DataFrame(d)

To solve your question directly, in terms of occurrences-per-second, we can count occurrences and divide by the mean end_time:

df.groupby('channels').end_time.agg(lambda x: x.count()/x.mean()).to_dict()

To generalize this, let's create a function to_freq that takes as input both the series x and the desired rate as a string rate:

def to_freq(x, rate='s'):
    d = {'s':1, 'm': 60, 'h': 60*60, 'd': 60*60*24}
    f = x.count()/x.mean()
    return f/d[rate]

Now, our original code becames:

df.groupby('channels').end_time.agg(lambda x: to_freq(x)).to_dict()

And we can find the occurrences-per-hour as follows:

df.groupby('channels').end_time.agg(lambda x: to_freq(x, rate='h')).to_dict()

Computing rate of occurrences per unit of time in a pandas dataframe

Desired Output

My attempt so far

Desired Generalization

Answers (2)

Related Questions