Reputation: 217
I have a dataframe like this:
user datetime mode
-------------------------------------------------
1 2015-09-10 11:50:27 vehicle
1 2015-11-22 10:15:03 vehicle
1 2015-11-23 10:35:03 stop
2 2015-11-22 10:11:13 walk
2 2015-11-22 10:13:08 walk
2 2015-09-10 10:21:52 stop
I'm struggling to calculate time of the trip(mode) for every user, for every day of the month.
My idea was to extract month, day, hour and then group by user, month, day, mode and hour to calculate the difference between max and min values using this.
df.assign(output=main_table.groupby(['user','month_n','day_n','mode','hour_n']).datetime
.apply(lambda x: x - x.iloc[0]))
however when i try summ up the outputs
df.groupby(['user','month_n','day_n','mode','hour_n'])['output'].sum()
it doesn't seem to produce correct result.
My desired output is
user month day mode time_spent(sec)
-------------------------------------------------
1 10 5 vehicle 3600
1 10 5 walk 12345
1 10 5 stop 25879
1 10 6 walk 15
1 10 6 vehicle 98522
2 10 5 walk 1298522
2 10 11 vehicle 99622
3 10 6 vehicle 23247
Any help will be much much appreciated!! Thanks.
UPDATED a better example of df
user datetime mode
-------------------------------------------------
1 10/09/2015 11:50:27 vehicle
1 10/09/2015 11:50:37 vehicle
1 10/09/2015 11:52:57 vehicle
1 10/09/2015 11:53:27 vehicle
1 10/09/2015 10:21:52 walk
1 10/09/2015 11:52:02 walk
1 10/09/2015 11:53:32 walk
1 10/09/2015 10:23:32 walk
1 10/09/2015 11:50:22 vehicle
1 10/09/2015 11:50:57 vehicle
2 22/11/2015 10:15:53 walk
2 22/11/2015 10:13:53 walk
2 22/11/2015 10:16:08 walk
2 22/11/2015 10:15:38 walk
2 22/11/2015 10:16:23 walk
2 22/11/2015 10:10:33 walk
2 22/11/2015 10:15:03 walk
2 22/11/2015 10:11:13 walk
2 22/11/2015 10:13:08 walk
2 22/11/2015 10:10:28 walk
to add some context dataset contains many users, several weeks of datetime and 10 different modes which could be repeated numerous time during the day, each of them have start/end timestamp.
Upvotes: 0
Views: 340
Reputation: 3926
Here is how I will go:
from io import StringIO
import pandas as pd
s = """user,datetime,mode
1, 10/09/2015 11:50:27, vehicle
1, 10/09/2015 11:50:37, vehicle
1, 10/09/2015 11:52:57, vehicle
1, 10/09/2015 11:53:27, vehicle
1, 10/09/2015 10:21:52, walk
1, 10/09/2015 11:52:02, walk
1, 10/09/2015 11:53:32, walk
1, 10/09/2015 10:23:32, walk
1, 10/09/2015 11:50:22, vehicle
1, 10/09/2015 11:50:57, vehicle
2, 22/11/2015 10:15:53 , walk
2, 22/11/2015 10:13:53 , walk
2, 22/11/2015 10:16:08 , walk
2, 22/11/2015 10:15:38 , walk
2, 22/11/2015 10:16:23 , walk
2, 22/11/2015 10:10:33 , walk
2, 22/11/2015 10:15:03 , walk
2, 22/11/2015 10:11:13 , walk
2, 22/11/2015 10:13:08 , walk
2, 22/11/2015 10:10:28 , walk"""
df = pd.read_csv(StringIO(s))
df.datetime = pd.to_datetime(df.datetime)
df.groupby(["user", "mode"]).datetime.max() - df.groupby(
["user", "mode"]
).datetime.min()
It generates the desired output:
Upvotes: 1