Reputation: 113
I have dataset where I calculate service times based on request and response times. I would like to add a calculation of requests in the last second to show the obvious relationship that as we get more requests per second the system slows. Here is the data that I have, for example:
serviceTimes.head()
Out[71]:
Id Req_Time Rsp_Time ServiceTime
0 3_1 2015-02-13 14:07:08.729000 2015-02-13 14:07:08.821000 00:00:00.092000
1 3_2 2015-02-13 14:07:08.929000 2015-02-13 14:07:08.929000 00:00:00
2 3_12 2015-02-13 14:11:53.908000 2015-02-13 14:11:53.981000 00:00:00.073000
3 3_14 2015-02-13 14:11:54.111000 2015-02-13 14:11:54.250000 00:00:00.139000
4 3_15 2015-02-13 14:11:54.111000 2015-02-13 14:11:54.282000 00:00:00.171000
For this I would like a rolling data set of something like:
0 14:07:08 2
1 14:11:53 1
2 14:11:54 2
I've tried rolling_sum and rolling_count, but unless I am using them wrong or not understanding the period function, it is not working for me.
Upvotes: 0
Views: 1442
Reputation: 109616
You first need to transform the timestamp into a string which you then groupby, showing the count and average service times:
serviceTimes['timestamp'] = [t.strftime('%y-%m-%d %H:%M') for t in serviceTimes.Req_Time]
serviceTimes.groupby('timestamp')['ServiceTime'].agg(['mean', 'count'])
Alternatively, create a data frame of the request time in the appropriate string format, e.g. 15-13-15 17:27, then count the occurrence of each time stamp using value_counts(). You can also plot the results quite easily.
df = pd.DataFrame([t.strftime('%y-%m-%d %H:%M') for t in serviceTimes.Req_Time],
columns=['timestamp'])
response = df.timestamp.value_counts()
response.plot(rot=90)
Upvotes: 2
Reputation: 2318
For your problem, it looks like you want to summarize your data set using a split-apply-combine approach. See here for the documentation that will help you get your code in working but basically, you'll want to do the following:
Req_Time
down to only second resolution (e.g. 14:07:08.729000
becomes 14:07:08
)groups = serviceTimes.groupby('Req_Time_Sec)
to separate your data set into sub-groups based on which second each request occurs in.new_df = groups.aggregate(len)
)The above is all untested pseudo-code, but the code, along with the link to the documentation, should help you get where you want to go.
Upvotes: 3