Reputation: 841
I have a dataframe created by:
df = pd.DataFrame({})
df['Date'] = pd.to_datetime(np.arange(0,12), unit='h', origin='2018-08-01 06:00:00')
df['ship'] = [1,1,2,2,2,3,3,3,3,3,3,3] # ship ID number
dt_trip = 4 # maximum duration of each trip to be classified as the same trip
Date ship
0 2018-08-01 06:00:00 1
1 2018-08-01 07:00:00 1
2 2018-08-01 08:00:00 2
3 2018-08-01 09:00:00 2
4 2018-08-01 10:00:00 2
5 2018-08-01 11:00:00 3
6 2018-08-01 12:00:00 3
7 2018-08-01 13:00:00 3
8 2018-08-01 14:00:00 3
9 2018-08-01 15:00:00 3
10 2018-08-01 16:00:00 3
11 2018-08-01 17:00:00 3
I try to get a a new column which shows the trips of each ship. Each trip is defined by an interval of 4 hours with respect to the start of the trip. When a new ship number is on the next row, automatically a new trip should start (irrespective of the previous datetime). From a previous post I got a solution for the trips.
origin = df["Date"][0].hour
df["Trip"] = df.apply(lambda x: ((x["Date"].hour - origin) // dt_trip) + 1, axis=1)
df["Trip"] = df.groupby(['Trip','ship']).ngroup() +1 # trip starts at: 1
This solution takes a new trip when the ship-column changes its row. The only change I want to have is to change the origin to the datetime when a new trip starts. So index 4 should have Trip = 2, because the ship is the same and the time difference between the start of the trip (index=2). Now it looks at the first given datetime.
Desired solution looks like:
Date ship Trip Trip_desired
0 2018-08-01 06:00:00 1 1 1
1 2018-08-01 07:00:00 1 1 1
2 2018-08-01 08:00:00 2 2 2
3 2018-08-01 09:00:00 2 2 2
4 2018-08-01 10:00:00 2 3 2
5 2018-08-01 11:00:00 3 4 3
6 2018-08-01 12:00:00 3 4 3
7 2018-08-01 13:00:00 3 4 3
8 2018-08-01 14:00:00 3 5 3
9 2018-08-01 15:00:00 3 5 4
10 2018-08-01 16:00:00 3 5 4
11 2018-08-01 17:00:00 3 5 4
Upvotes: 1
Views: 55
Reputation: 150735
I would do:
total_time = df['Date'] - df.groupby('ship')['Date'].transform('min')
trips = total_time.dt.total_seconds().fillna(0)//(dt_trip*3600)
df['trip'] = df.groupby(['ship', trips]).ngroup()+1
Output:
Date ship trip
0 2018-08-01 06:00:00 1 1
1 2018-08-01 07:00:00 1 1
2 2018-08-01 08:00:00 2 2
3 2018-08-01 09:00:00 2 2
4 2018-08-01 10:00:00 2 2
5 2018-08-01 11:00:00 3 3
6 2018-08-01 12:00:00 3 3
7 2018-08-01 13:00:00 3 3
8 2018-08-01 14:00:00 3 3
9 2018-08-01 15:00:00 3 4
10 2018-08-01 16:00:00 3 4
11 2018-08-01 17:00:00 3 4
Upvotes: 2