Reputation: 1261
I have a dataframe like this
df1 = pd.DataFrame({'x':[0,1,2,3,4,5,6,7,8,9],'y':['a','a','b','c','b','b','a','b','c','c']})
How can we return a dataframe like the following
df2 = pd.DataFrame({'x':[1,2,3,5,6,7,9],'y':['a','b','c','b','a','b','c']})
Is there an efficient way using column operations instead of looping through each rows?
Upvotes: 0
Views: 240
Reputation: 61920
You need to find everytime a value is different that the next, so comparing to the next is sufficient:
mask = df1['y'].ne(df1['y'].shift(-1))
df2 = df1[mask].reset_index(drop=True)
print(df2)
Output
x y
0 1 a
1 2 b
2 3 c
3 5 b
4 6 a
5 7 b
6 9 c
Upvotes: 3
Reputation: 150785
A classic application of cumsum
to find consecutive blocks:
df1.groupby(df1['y'].ne(df1['y'].shift()).cumsum(), as_index=False).last()
Output:
x y
0 1 a
1 2 b
2 3 c
3 5 b
4 6 a
5 7 b
6 9 c
Upvotes: 2