Reputation: 1196
I just need to delete, by group, rows that contain the same value 3 times in a row and leading rows that are less than previous rows. I've tried dplyr
's lead
function, but have been unsuccessful. The example data are just a subset of thousands of rows where a conditional will work best.
Data Before
print(df)
sample measurement
1 a 0.1443
2 a 0.2220
3 a 0.3330
4 a 0.9435
5 a 0.8051 # Delete sample "a" as value is less than previous
6 b 0.1554
7 b 0.2775
8 b 0.3885
9 b 1.2210
10 b 1.8093
11 c 0.0000
12 c 0.0000
13 c 0.0000 # Delete sample "c" as there a 3 consecutive values in a row
14 c 0.0333
15 c 0.2997
Desired Output
sample measurement
1 b 0.1554
2 b 0.2775
3 b 0.3885
4 b 1.2210
5 b 1.8093
Failed Attempts
Here, I tried to just filter any leading measurement that was greater or equal to the previous row, but failed.
df %>%
group_by(sample) %>%
filter(!any(lead(measurement) <= measurement)) %>%
ungroup()
# A tibble: 0 x 2
# ... with 2 variables: sample <chr>, measurement <dbl>
If I tried to just collect the rows that fit the condition the above code does what is intended. I'm sure there's a better way to do this.
df %>%
group_by(sample) %>%
filter(any(lead(measurement) <= measurement)) %>%
ungroup()
sample measurement
<chr> <dbl>
1 a 0.144
2 a 0.222
3 a 0.333
4 a 0.944
5 a 0.805
6 c 0
7 c 0
8 c 0
9 c 0.0333
10 c 0.300
Reproducible Code
structure(list(sample = c("a", "a", "a", "a", "a", "b", "b",
"b", "b", "b", "c", "c", "c", "c", "c"), measurement = c(0.1443,
0.222, 0.333, 0.9435, 0.8051, 0.1554, 0.2775, 0.3885, 1.221,
1.8093, 0, 0, 0, 0.0333, 0.2997)), row.names = c(NA, -15L), na.action = structure(c(`1` = 1L,
`2` = 2L, `3` = 3L, `5` = 5L, `6` = 6L, `8` = 8L, `9` = 9L, `11` = 11L,
`12` = 12L, `15` = 15L, `16` = 16L, `17` = 17L, `18` = 18L, `20` = 20L,
`21` = 21L, `23` = 23L, `24` = 24L, `26` = 26L, `27` = 27L, `30` = 30L,
`31` = 31L, `32` = 32L, `33` = 33L, `35` = 35L, `36` = 36L, `38` = 38L,
`39` = 39L, `41` = 41L, `42` = 42L, `45` = 45L), class = "omit"), class = "data.frame")
Upvotes: 1
Views: 39
Reputation: 32548
d %>%
group_by(sample) %>%
filter(!any(measurement - lag(measurement, default = 0) <= 0) & n() >= 3) %>%
ungroup()
Upvotes: 2
Reputation: 887128
One option is to do two checks -1) with diff
that gets the difference of adjacent elements, check if any
value is less than 0 or get the run-length-id (rleid
) of 'measurement', tabulate
and check if any
frequency is greater than or equal to 3 after grouping by 'sample'
library(dplyr)
library(data.table)
df %>%
group_by(sample) %>%
filter( !(any(diff(measurement) < 0)| any(tabulate(rleid(measurement)) >=3)))
# A tibble: 5 x 2
# Groups: sample [1]
# sample measurement
# <chr> <dbl>
#1 b 0.155
#2 b 0.278
#3 b 0.388
#4 b 1.22
#5 b 1.81
Upvotes: 2