Reputation: 15
I have the following problem, I want to detect if 2 or more consecutive values in a column of a dataframe have a value greater than 0.5. For this I have chosen the following approach: I check each cell if the value is less than 0.5 and create an entry in the column "condition". (See table) Now I have the following problem, how can I detect in a column if 2 consecutive cells have the same value (row 4-5)? Or is it possible to detect the problem also directly in the Data column. If 2 consecutive cells are False, the dataframe can be discarded.
I would be very grateful for any help!
data | condition | |
---|---|---|
0 | 0.1 | True |
1 | 0.1 | True |
2 | 0.25 | True |
3 | 0.3 | True |
4 | 0.6 | False |
5 | 0.7 | False |
6 | 0.3 | True |
7 | 0.1 | True |
6 | 0.9 | False |
7 | 0.1 | True |
Upvotes: 1
Views: 769
Reputation: 260690
You can compute a boolean series of values greater than 0.5 (i.e True when invalid). Then apply a boolean and (&
) between this series and its shift
. Any two consecutive True values will yield True. You can check if any
is present to decide to discard the dataset:
s = df['data'].gt(0.5)
(s&s.shift()).any()
Output: True
-> the dataset is invalid
Upvotes: 3
Reputation: 36623
You can use the .diff
method and check that it is equal to zero.
df['eq_to_prev'] = df.data.diff().eq(0)
Upvotes: 0