How to remove duplicate rows in pandas with multiple conditions

Question

import pandas as pd

df = pd.DataFrame(
    [
        ['China', 'L', '08/06/2022 20:00', '08/10/2022 20:00'],
        ['China', 'L', '8/13/2022 00:54', '8/14/2022 00:54'],
        ['China', 'M', '8/14/2022 00:54', '8/14/2022 12:54'],
        ['United Kingdom', 'L', '8/27/2022 06:36', '8/31/2022 21:08'],
        ['United Kingdom', 'L', '9/01/2022 21:08', '09/02/2022 21:38'],
        ['China', 'D', '09/04/2022 21:38', '09/06/2022 21:38']
    ],
    columns=['Country', 'Function', 'Arrival', 'Departure']
)

In this case, i want to remove the consistent duplicate rows and replace the departure time with the last duplicates value, with below two conditions:

do not remove other duplicates that are not in consistent manner.
if the 'Function' column changed, do not take it as duplicate even it is in consistent manner.

So it should look like this:

df = pd.DataFrame(
    [
        ['China', 'L', '08/06/2022 20:00', '8/14/2022 00:54'],
        ['China', 'M', '8/14/2022 00:54', '8/14/2022 12:54'],
        ['United Kingdom', 'L', '8/27/2022 06:36', '09/02/2022 21:38'],
        ['China', 'D', '09/04/2022 21:38', '09/06/2022 21:38']
    ],
    columns=['Country', 'Function', 'Arrival', 'Departure']
)

How to remove duplicate rows in pandas with multiple conditions

Answers (1)

Related Questions