Reputation: 93
I have a dataframe that looks like this:
Time x y
00:10:00 5.4 4.2
00:20:00 6.2 5.2
00:30:00 4.1 5.2
00:40:00 5.8 5.2
00:50:00 5.1 3.8
I need to find a way to remove the consecutive repeating values (5.2) in y. I can't use pd.drop_duplicates() as that would remove genuine 5.2 values from the data. I'd rather not iterate through each row as it is a very large dataframe and feels like poor pandas practice. I'm hoping there's a nice method I'm missing but haven't found one on my search so far.
Many Thanks
Upvotes: 1
Views: 385
Reputation: 42946
If I understand you correctly, you want to drop consecutive duplicates, we can use boolean indexing
with .shift
and .ne
here.
note: I extended your dataframe with 1 row to show the method works:
# Extended example dataframe
Time x y
0 00:10:00 5.4 4.2
1 00:20:00 6.2 5.2
2 00:30:00 4.1 5.2
3 00:40:00 5.8 5.2
4 00:50:00 5.1 3.8
5 00:60:00 3.3 5.2
m = df['y'].shift().ne(df['y'])
df[m]
Time x y
0 00:10:00 5.4 4.2
1 00:20:00 6.2 5.2
4 00:50:00 5.1 3.8
5 00:60:00 3.3 5.2
ne
is the equivalent of !=
and stands for not equal:
df['x'] != 5.4
df['x'].ne(5.4)
0 False
1 True
2 True
3 True
4 True
5 True
Name: x, dtype: bool
0 False
1 True
2 True
3 True
4 True
5 True
Name: x, dtype: bool
Upvotes: 5