Jeroen
Jeroen

Reputation: 841

Pandas group by datetime within column level

I have a dataframe created by:

df = pd.DataFrame({})
df['Date']  = pd.to_datetime(np.arange(0,12), unit='h', origin='2018-08-01  06:00:00')
df['ship'] = [1,1,2,2,2,3,3,3,3,3,3,3] # ship ID number
dt_trip = 4 # maximum duration of each trip to be classified as the same trip


                  Date  ship
0  2018-08-01 06:00:00     1
1  2018-08-01 07:00:00     1
2  2018-08-01 08:00:00     2
3  2018-08-01 09:00:00     2
4  2018-08-01 10:00:00     2
5  2018-08-01 11:00:00     3
6  2018-08-01 12:00:00     3
7  2018-08-01 13:00:00     3
8  2018-08-01 14:00:00     3
9  2018-08-01 15:00:00     3
10 2018-08-01 16:00:00     3
11 2018-08-01 17:00:00     3

I try to get a a new column which shows the trips of each ship. Each trip is defined by an interval of 4 hours with respect to the start of the trip. When a new ship number is on the next row, automatically a new trip should start (irrespective of the previous datetime). From a previous post I got a solution for the trips.

origin = df["Date"][0].hour
df["Trip"] = df.apply(lambda x: ((x["Date"].hour - origin) // dt_trip) + 1, axis=1)
df["Trip"] = df.groupby(['Trip','ship']).ngroup() +1 # trip starts at: 1

This solution takes a new trip when the ship-column changes its row. The only change I want to have is to change the origin to the datetime when a new trip starts. So index 4 should have Trip = 2, because the ship is the same and the time difference between the start of the trip (index=2). Now it looks at the first given datetime.

Desired solution looks like:

                  Date  ship  Trip  Trip_desired
0  2018-08-01 06:00:00     1     1  1
1  2018-08-01 07:00:00     1     1  1
2  2018-08-01 08:00:00     2     2  2
3  2018-08-01 09:00:00     2     2  2
4  2018-08-01 10:00:00     2     3  2
5  2018-08-01 11:00:00     3     4  3
6  2018-08-01 12:00:00     3     4  3
7  2018-08-01 13:00:00     3     4  3
8  2018-08-01 14:00:00     3     5  3
9  2018-08-01 15:00:00     3     5  4
10 2018-08-01 16:00:00     3     5  4
11 2018-08-01 17:00:00     3     5  4

Upvotes: 1

Views: 55

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

I would do:

total_time = df['Date'] - df.groupby('ship')['Date'].transform('min')
trips = total_time.dt.total_seconds().fillna(0)//(dt_trip*3600)

df['trip'] = df.groupby(['ship', trips]).ngroup()+1

Output:

                  Date  ship  trip
0  2018-08-01 06:00:00     1     1
1  2018-08-01 07:00:00     1     1
2  2018-08-01 08:00:00     2     2
3  2018-08-01 09:00:00     2     2
4  2018-08-01 10:00:00     2     2
5  2018-08-01 11:00:00     3     3
6  2018-08-01 12:00:00     3     3
7  2018-08-01 13:00:00     3     3
8  2018-08-01 14:00:00     3     3
9  2018-08-01 15:00:00     3     4
10 2018-08-01 16:00:00     3     4
11 2018-08-01 17:00:00     3     4

Upvotes: 2

Related Questions