devPottr
devPottr

Reputation: 15

Compare 2 consecutive cells in a dataframe for equality

I have the following problem, I want to detect if 2 or more consecutive values in a column of a dataframe have a value greater than 0.5. For this I have chosen the following approach: I check each cell if the value is less than 0.5 and create an entry in the column "condition". (See table) Now I have the following problem, how can I detect in a column if 2 consecutive cells have the same value (row 4-5)? Or is it possible to detect the problem also directly in the Data column. If 2 consecutive cells are False, the dataframe can be discarded.

I would be very grateful for any help!

data condition
0 0.1 True
1 0.1 True
2 0.25 True
3 0.3 True
4 0.6 False
5 0.7 False
6 0.3 True
7 0.1 True
6 0.9 False
7 0.1 True

Upvotes: 1

Views: 769

Answers (2)

mozway
mozway

Reputation: 260690

You can compute a boolean series of values greater than 0.5 (i.e True when invalid). Then apply a boolean and (&) between this series and its shift. Any two consecutive True values will yield True. You can check if any is present to decide to discard the dataset:

s = df['data'].gt(0.5)
(s&s.shift()).any()

Output: True -> the dataset is invalid

Upvotes: 3

James
James

Reputation: 36623

You can use the .diff method and check that it is equal to zero.

df['eq_to_prev'] = df.data.diff().eq(0)

Upvotes: 0

Related Questions