Eiffelbear
Eiffelbear

Reputation: 428

How to delete the duplicates and keep the first row only when these rows are next to each other

Question

Code Example

import pandas as pd

example_df = pd.DataFrame({'name':['John','Mery','Sarah','Jay','Lala','Mike'], 
                           'Day':['Monday','Monday','Tuesday','Tuesday','Monday','Tuesday']})

example_df
>>>    Name      Day
0      John     Monday
1      Mery     Monday
2      Sarah    Tuesday
3      Jay      Tuesday
4      Lala     Monday
5      Mike     Tuesday
desired_df
>>>    Name      Day
0      John     Monday
1      Sarah    Tuesday
2      Lala     Monday
3      Mike     Tuesday

Upvotes: 0

Views: 834

Answers (2)

MaxYarmolinsky
MaxYarmolinsky

Reputation: 1139

Here's my ugly, slow solution that also works : )

temp = example_df.copy()

for index, row in example_df.iterrows():
    if index == len(example_df.index) -1:
        break
    if example_df.loc[index,'Day'] == example_df.loc[index+1,'Day']:
        temp.drop(index+1,inplace=True)

example_df = temp

Upvotes: 0

ggaurav
ggaurav

Reputation: 1804

You can use shift and then compare with the original column. Wherever the values are not equal, that means it is not a consecutive duplicate and that can be retained

example_df[example_df['Day'].shift(1) != example_df['Day']]

    name    Day
0   John    Monday
2   Sarah   Tuesday
4   Lala    Monday
5   Mike    Tuesday

example_df along with shifted day:

example_df['Day_1_shifted'] = example_df['Day'].shift(1)

    name    Day     Day_1_shifted
0   John    Monday  NaN
1   Mery    Monday  Monday
2   Sarah   Tuesday Monday
3   Jay     Tuesday Tuesday
4   Lala    Monday  Tuesday
5   Mike    Tuesday Monday

Upvotes: 1

Related Questions