Reputation: 428
pd.DataFrame.drop_duplicates
is not an answer, as it drops all the duplicated rows, even when they are not next to each other.import pandas as pd
example_df = pd.DataFrame({'name':['John','Mery','Sarah','Jay','Lala','Mike'],
'Day':['Monday','Monday','Tuesday','Tuesday','Monday','Tuesday']})
example_df
>>> Name Day
0 John Monday
1 Mery Monday
2 Sarah Tuesday
3 Jay Tuesday
4 Lala Monday
5 Mike Tuesday
desired_df
desired_df
>>> Name Day
0 John Monday
1 Sarah Tuesday
2 Lala Monday
3 Mike Tuesday
Upvotes: 0
Views: 834
Reputation: 1139
Here's my ugly, slow solution that also works : )
temp = example_df.copy()
for index, row in example_df.iterrows():
if index == len(example_df.index) -1:
break
if example_df.loc[index,'Day'] == example_df.loc[index+1,'Day']:
temp.drop(index+1,inplace=True)
example_df = temp
Upvotes: 0
Reputation: 1804
You can use shift and then compare with the original column. Wherever the values are not equal, that means it is not a consecutive duplicate and that can be retained
example_df[example_df['Day'].shift(1) != example_df['Day']]
name Day
0 John Monday
2 Sarah Tuesday
4 Lala Monday
5 Mike Tuesday
example_df along with shifted day:
example_df['Day_1_shifted'] = example_df['Day'].shift(1)
name Day Day_1_shifted
0 John Monday NaN
1 Mery Monday Monday
2 Sarah Tuesday Monday
3 Jay Tuesday Tuesday
4 Lala Monday Tuesday
5 Mike Tuesday Monday
Upvotes: 1