How to remove repeating values in a pandas dataframe

Question

I have a dataframe that looks like this:

Time       x     y
00:10:00   5.4   4.2
00:20:00   6.2   5.2
00:30:00   4.1   5.2
00:40:00   5.8   5.2
00:50:00   5.1   3.8

I need to find a way to remove the consecutive repeating values (5.2) in y. I can't use pd.drop_duplicates() as that would remove genuine 5.2 values from the data. I'd rather not iterate through each row as it is a very large dataframe and feels like poor pandas practice. I'm hoping there's a nice method I'm missing but haven't found one on my search so far.

Many Thanks

Erfan · Accepted Answer

If I understand you correctly, you want to drop consecutive duplicates, we can use boolean indexing with .shift and .ne here.

note: I extended your dataframe with 1 row to show the method works:

# Extended example dataframe
       Time    x    y
0  00:10:00  5.4  4.2
1  00:20:00  6.2  5.2
2  00:30:00  4.1  5.2
3  00:40:00  5.8  5.2
4  00:50:00  5.1  3.8
5  00:60:00  3.3  5.2

m = df['y'].shift().ne(df['y'])
df[m]

       Time    x    y
0  00:10:00  5.4  4.2
1  00:20:00  6.2  5.2
4  00:50:00  5.1  3.8
5  00:60:00  3.3  5.2

ne is the equivalent of != and stands for not equal:

df['x'] != 5.4 
df['x'].ne(5.4)

0    False
1     True
2     True
3     True
4     True
5     True
Name: x, dtype: bool
0    False
1     True
2     True
3     True
4     True
5     True
Name: x, dtype: bool

How to remove repeating values in a pandas dataframe

Answers (1)

Related Questions