How to delete the duplicates and keep the first row only when these rows are next to each other

Question

Question

How to drop rows with repeated values in a certain column and keep the first, only when they are next to each other?
The pandas method pd.DataFrame.drop_duplicates is not an answer, as it drops all the duplicated rows, even when they are not next to each other.

Code Example

import pandas as pd

example_df = pd.DataFrame({'name':['John','Mery','Sarah','Jay','Lala','Mike'], 
                           'Day':['Monday','Monday','Tuesday','Tuesday','Monday','Tuesday']})

example_df
>>>    Name      Day
0      John     Monday
1      Mery     Monday
2      Sarah    Tuesday
3      Jay      Tuesday
4      Lala     Monday
5      Mike     Tuesday

My desired output is shown as follows, with the variable desired_df

desired_df
>>>    Name      Day
0      John     Monday
1      Sarah    Tuesday
2      Lala     Monday
3      Mike     Tuesday

As you can see above, only duplicates that are next to each other are deleted.

ggaurav · Accepted Answer

You can use shift and then compare with the original column. Wherever the values are not equal, that means it is not a consecutive duplicate and that can be retained

example_df[example_df['Day'].shift(1) != example_df['Day']]

    name    Day
0   John    Monday
2   Sarah   Tuesday
4   Lala    Monday
5   Mike    Tuesday

example_df along with shifted day:

example_df['Day_1_shifted'] = example_df['Day'].shift(1)

    name    Day     Day_1_shifted
0   John    Monday  NaN
1   Mery    Monday  Monday
2   Sarah   Tuesday Monday
3   Jay     Tuesday Tuesday
4   Lala    Monday  Tuesday
5   Mike    Tuesday Monday

How to delete the duplicates and keep the first row only when these rows are next to each other

Question

Code Example

Answers (2)

Related Questions