Reputation: 25
import pandas as pd
df = pd.DataFrame(
[
['China', 'L', '08/06/2022 20:00', '08/10/2022 20:00'],
['China', 'L', '8/13/2022 00:54', '8/14/2022 00:54'],
['China', 'M', '8/14/2022 00:54', '8/14/2022 12:54'],
['United Kingdom', 'L', '8/27/2022 06:36', '8/31/2022 21:08'],
['United Kingdom', 'L', '9/01/2022 21:08', '09/02/2022 21:38'],
['China', 'D', '09/04/2022 21:38', '09/06/2022 21:38']
],
columns=['Country', 'Function', 'Arrival', 'Departure']
)
In this case, i want to remove the consistent duplicate rows and replace the departure time with the last duplicates value, with below two conditions:
So it should look like this:
df = pd.DataFrame(
[
['China', 'L', '08/06/2022 20:00', '8/14/2022 00:54'],
['China', 'M', '8/14/2022 00:54', '8/14/2022 12:54'],
['United Kingdom', 'L', '8/27/2022 06:36', '09/02/2022 21:38'],
['China', 'D', '09/04/2022 21:38', '09/06/2022 21:38']
],
columns=['Country', 'Function', 'Arrival', 'Departure']
)
Upvotes: 0
Views: 145
Reputation: 260690
You can use groupby.idxmax
:
idx = (pd.to_datetime(df['Departure'])
.groupby([df['Country'], df['Function']], sort=False)
.idxmax()
)
out = df.loc[idx]
output:
Country Function Arrival Departure
1 China L 8/13/2022 00:54 8/14/2022 00:54
2 China M 8/14/2022 00:54 8/14/2022 12:54
4 United Kingdom L 9/01/2022 21:08 09/02/2022 21:38
5 China D 09/04/2022 21:38 09/06/2022 21:38
Upvotes: 1