Alejandro A
Alejandro A

Reputation: 1190

Best approach to calculate time difference between two rows

Scenario

I have a dataframe with a given structure,and to sum it up at the end I want to find the time difference between the responses and requests of a service. It has the following columns:

And an example of data would be:

Timestamp   Service      Command     Message_Type   Message_ID
12:00:00    FoodOrders  SeeStock()  Request        125
12:00:02    FoodOrders  SeeStock()  Response       125

The output would have to be something like

Service   Command   Message_ID  TimeDiff
FoodOrders  SeeStock  125       00:00:02

What have I thought of doing

Grouping by Service,Command,Message_ID and add an additional column with some function that calculates the difference of time.

My actual questions

Thanks.

Upvotes: 1

Views: 282

Answers (4)

jezrael
jezrael

Reputation: 863751

If performance is important, avoid aggregation and groupby, because slow, better is create Response and Response Series with MultiIndex and subtract Timestamps, sort_index should also help with performance:

#if necessary
#df['Timestamp'] = pd.to_timedelta(df['Timestamp'])

cols = ['Service','Command','Message_ID']
s1 = df[df['Message_Type'] == 'Response'].set_index(cols)['Timestamp'].sort_index()
s2 = df[df['Message_Type'] == 'Request'].set_index(cols)['Timestamp'].sort_index()

df1 = s1.sub(s2).reset_index()
print (df1)
      Service     Command  Message_ID Timestamp
0  FoodOrders  SeeStock()         125  00:00:02

Upvotes: 1

If you use jupiter notebook, you can try something like this:

%timeit df.sort_values('Time').groupby(['Service', 'Command', 'Message_Type', 'Message_ID']).apply(lambda x: x.iloc[1]['Time'] - x.iloc[0]['Time'])

In my sample I have this out:

2.97 ms ± 310 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

And I also think that it's a good plan = )

Upvotes: 1

PV8
PV8

Reputation: 6280

followed by this code from another post:

import time

start = time.time()
print("hello")
end = time.time()
print(end - start)

you can measure the time on your own.

Try oyur approach and the lambda to test it.

Upvotes: 1

GZ0
GZ0

Reputation: 4268

The plan is more or less OK. Note that for efficiency it would be better not to pass a lambda function directly to calculate a custom aggregation like TimeDiff. It is better to first calculate auxiliary aggregations that can be done with pandas / numpy built-ins and then compute your custom aggregation based on those.

Upvotes: 1

Related Questions