Delete group based if there are 3 consecutive identical values and if the leading value is less than the previous

Question

I just need to delete, by group, rows that contain the same value 3 times in a row and leading rows that are less than previous rows. I've tried dplyr's lead function, but have been unsuccessful. The example data are just a subset of thousands of rows where a conditional will work best.

Data Before

print(df)
   sample measurement
1       a      0.1443
2       a      0.2220
3       a      0.3330
4       a      0.9435
5       a      0.8051 # Delete sample "a" as value is less than previous
6       b      0.1554
7       b      0.2775
8       b      0.3885
9       b      1.2210
10      b      1.8093
11      c      0.0000
12      c      0.0000
13      c      0.0000 # Delete sample "c" as there a 3 consecutive values in a row
14      c      0.0333
15      c      0.2997

Desired Output

   sample measurement
1       b      0.1554
2       b      0.2775
3       b      0.3885
4       b      1.2210
5       b      1.8093

Failed Attempts

Here, I tried to just filter any leading measurement that was greater or equal to the previous row, but failed.

df %>%
    group_by(sample) %>%
        filter(!any(lead(measurement) <= measurement)) %>% 
            ungroup()

# A tibble: 0 x 2
# ... with 2 variables: sample , measurement

If I tried to just collect the rows that fit the condition the above code does what is intended. I'm sure there's a better way to do this.

df %>%
    group_by(sample) %>%
        filter(any(lead(measurement) <= measurement)) %>% 
            ungroup()

   sample measurement
           
 1 a           0.144 
 2 a           0.222 
 3 a           0.333 
 4 a           0.944 
 5 a           0.805 
 6 c           0     
 7 c           0     
 8 c           0     
 9 c           0.0333
10 c           0.300

Reproducible Code

structure(list(sample = c("a", "a", "a", "a", "a", "b", "b", 
"b", "b", "b", "c", "c", "c", "c", "c"), measurement = c(0.1443, 
0.222, 0.333, 0.9435, 0.8051, 0.1554, 0.2775, 0.3885, 1.221, 
1.8093, 0, 0, 0, 0.0333, 0.2997)), row.names = c(NA, -15L), na.action = structure(c(`1` = 1L, 
`2` = 2L, `3` = 3L, `5` = 5L, `6` = 6L, `8` = 8L, `9` = 9L, `11` = 11L, 
`12` = 12L, `15` = 15L, `16` = 16L, `17` = 17L, `18` = 18L, `20` = 20L, 
`21` = 21L, `23` = 23L, `24` = 24L, `26` = 26L, `27` = 27L, `30` = 30L, 
`31` = 31L, `32` = 32L, `33` = 33L, `35` = 35L, `36` = 36L, `38` = 38L, 
`39` = 39L, `41` = 41L, `42` = 42L, `45` = 45L), class = "omit"), class = "data.frame")

akrun · Accepted Answer

One option is to do two checks -1) with diff that gets the difference of adjacent elements, check if any value is less than 0 or get the run-length-id (rleid) of 'measurement', tabulate and check if any frequency is greater than or equal to 3 after grouping by 'sample'

library(dplyr)
library(data.table)
df %>%
    group_by(sample) %>%
    filter( !(any(diff(measurement) < 0)| any(tabulate(rleid(measurement)) >=3)))
# A tibble: 5 x 2
# Groups:   sample [1]
#  sample measurement
#          
#1 b            0.155
#2 b            0.278
#3 b            0.388
#4 b            1.22 
#5 b            1.81

Delete group based if there are 3 consecutive identical values and if the leading value is less than the previous

Answers (2)

Related Questions