With pandas, how do I calculate a rolling number of events in the last second given timestamp data?

Question

I have dataset where I calculate service times based on request and response times. I would like to add a calculation of requests in the last second to show the obvious relationship that as we get more requests per second the system slows. Here is the data that I have, for example:

serviceTimes.head()
Out[71]: 
     Id                   Req_Time                   Rsp_Time     ServiceTime
0   3_1 2015-02-13 14:07:08.729000 2015-02-13 14:07:08.821000 00:00:00.092000
1   3_2 2015-02-13 14:07:08.929000 2015-02-13 14:07:08.929000        00:00:00
2  3_12 2015-02-13 14:11:53.908000 2015-02-13 14:11:53.981000 00:00:00.073000
3  3_14 2015-02-13 14:11:54.111000 2015-02-13 14:11:54.250000 00:00:00.139000
4  3_15 2015-02-13 14:11:54.111000 2015-02-13 14:11:54.282000 00:00:00.171000

For this I would like a rolling data set of something like:

0 14:07:08 2
1 14:11:53 1
2 14:11:54 2

I've tried rolling_sum and rolling_count, but unless I am using them wrong or not understanding the period function, it is not working for me.

Zachary Cross · Accepted Answer

For your problem, it looks like you want to summarize your data set using a split-apply-combine approach. See here for the documentation that will help you get your code in working but basically, you'll want to do the following:

Create a new column (say, 'Req_Time_Sec that includes Req_Time down to only second resolution (e.g. 14:07:08.729000 becomes 14:07:08)
use groups = serviceTimes.groupby('Req_Time_Sec) to separate your data set into sub-groups based on which second each request occurs in.
Finally, create a new data set by calculating the length of each sub group (which represents the number of requests in that second) and aggregating the results into a single DataFrame (something like new_df = groups.aggregate(len))

The above is all untested pseudo-code, but the code, along with the link to the documentation, should help you get where you want to go.

With pandas, how do I calculate a rolling number of events in the last second given timestamp data?

Answers (2)

Related Questions