How to get the rows of last occurrence of a streak in pandas column?

Question

I have a dataframe like this

df1 = pd.DataFrame({'x':[0,1,2,3,4,5,6,7,8,9],'y':['a','a','b','c','b','b','a','b','c','c']})

How can we return a dataframe like the following

df2 = pd.DataFrame({'x':[1,2,3,5,6,7,9],'y':['a','b','c','b','a','b','c']})

Is there an efficient way using column operations instead of looping through each rows?

Dani Mesejo · Accepted Answer

You need to find everytime a value is different that the next, so comparing to the next is sufficient:

mask = df1['y'].ne(df1['y'].shift(-1))
df2 = df1[mask].reset_index(drop=True)
print(df2)

Output

Answers (2)