Dave N
Dave N

Reputation: 113

With pandas, how do I calculate a rolling number of events in the last second given timestamp data?

I have dataset where I calculate service times based on request and response times. I would like to add a calculation of requests in the last second to show the obvious relationship that as we get more requests per second the system slows. Here is the data that I have, for example:

serviceTimes.head()
Out[71]: 
     Id                   Req_Time                   Rsp_Time     ServiceTime
0   3_1 2015-02-13 14:07:08.729000 2015-02-13 14:07:08.821000 00:00:00.092000
1   3_2 2015-02-13 14:07:08.929000 2015-02-13 14:07:08.929000        00:00:00
2  3_12 2015-02-13 14:11:53.908000 2015-02-13 14:11:53.981000 00:00:00.073000
3  3_14 2015-02-13 14:11:54.111000 2015-02-13 14:11:54.250000 00:00:00.139000
4  3_15 2015-02-13 14:11:54.111000 2015-02-13 14:11:54.282000 00:00:00.171000

For this I would like a rolling data set of something like:

0 14:07:08 2
1 14:11:53 1
2 14:11:54 2

I've tried rolling_sum and rolling_count, but unless I am using them wrong or not understanding the period function, it is not working for me.

Upvotes: 0

Views: 1442

Answers (2)

Alexander
Alexander

Reputation: 109616

You first need to transform the timestamp into a string which you then groupby, showing the count and average service times:

serviceTimes['timestamp'] = [t.strftime('%y-%m-%d %H:%M') for t in serviceTimes.Req_Time]
serviceTimes.groupby('timestamp')['ServiceTime'].agg(['mean', 'count'])

Alternatively, create a data frame of the request time in the appropriate string format, e.g. 15-13-15 17:27, then count the occurrence of each time stamp using value_counts(). You can also plot the results quite easily.

df = pd.DataFrame([t.strftime('%y-%m-%d %H:%M') for t in serviceTimes.Req_Time],
                   columns=['timestamp'])
response = df.timestamp.value_counts()
response.plot(rot=90)

enter image description here

Upvotes: 2

Zachary Cross
Zachary Cross

Reputation: 2318

For your problem, it looks like you want to summarize your data set using a split-apply-combine approach. See here for the documentation that will help you get your code in working but basically, you'll want to do the following:

  • Create a new column (say, 'Req_Time_Sec that includes Req_Time down to only second resolution (e.g. 14:07:08.729000 becomes 14:07:08)
  • use groups = serviceTimes.groupby('Req_Time_Sec) to separate your data set into sub-groups based on which second each request occurs in.
  • Finally, create a new data set by calculating the length of each sub group (which represents the number of requests in that second) and aggregating the results into a single DataFrame (something like new_df = groups.aggregate(len))

The above is all untested pseudo-code, but the code, along with the link to the documentation, should help you get where you want to go.

Upvotes: 3

Related Questions