Vinay
Vinay

Reputation: 1261

How to get the rows of last occurrence of a streak in pandas column?

I have a dataframe like this

df1 = pd.DataFrame({'x':[0,1,2,3,4,5,6,7,8,9],'y':['a','a','b','c','b','b','a','b','c','c']})

How can we return a dataframe like the following

df2 = pd.DataFrame({'x':[1,2,3,5,6,7,9],'y':['a','b','c','b','a','b','c']})

enter image description here

Is there an efficient way using column operations instead of looping through each rows?

Upvotes: 0

Views: 240

Answers (2)

Dani Mesejo
Dani Mesejo

Reputation: 61920

You need to find everytime a value is different that the next, so comparing to the next is sufficient:

mask = df1['y'].ne(df1['y'].shift(-1))
df2 = df1[mask].reset_index(drop=True)
print(df2)

Output

   x  y
0  1  a
1  2  b
2  3  c
3  5  b
4  6  a
5  7  b
6  9  c

Upvotes: 3

Quang Hoang
Quang Hoang

Reputation: 150785

A classic application of cumsum to find consecutive blocks:

df1.groupby(df1['y'].ne(df1['y'].shift()).cumsum(), as_index=False).last()

Output:

   x  y
0  1  a
1  2  b
2  3  c
3  5  b
4  6  a
5  7  b
6  9  c

Upvotes: 2

Related Questions