Reputation: 565
I have a data.frame
like this:
> numeric_value <- c(10, 11, 21, 25, 15, 29, 30, 35, 9, 20, 21, 40, 22)
> threshold_reached <- c(F, F, T, T, F, T, T, T, F, T, T ,T, T)
> dat <- data.frame(numeric_value, threshold_reached)
I want a new variable for the maximum streak length of each streak (a streak being a TRUE
value of threshold_reached
), like this:
> max_streak_length <- c(0, 0, 2, 2, 0, 3, 3, 3, 0, 4, 4, 4, 4)
> data.frame(numeric_value, threshold_reached, max_streak_length)
numeric_value threshold_reached max_streak_length
1 10 FALSE 0
2 11 FALSE 0
3 21 TRUE 2
4 25 TRUE 2
5 15 FALSE 0
6 29 TRUE 3
7 30 TRUE 3
8 35 TRUE 3
9 9 FALSE 0
10 20 TRUE 4
11 21 TRUE 4
12 40 TRUE 4
13 22 TRUE 4
There are a few similar questions like this one and this one, which use the runner
package or rle
package. But I haven't found one that answers this specific problem, and I can't see a solution myself.
Preferably, I would like an answer using dplyr::mutate
but this isn't essential.
Thanks!
Upvotes: 2
Views: 101
Reputation: 21938
I think you can use the following solution:
library(dplyr)
library(data.table)
dat %>%
mutate(rles = rleid(threshold_reached)) %>%
group_by(rles) %>%
mutate(max_streak_length = ifelse(!threshold_reached, 0, n())) %>%
select(-rles)
# A tibble: 13 x 4
# Groups: rles [6]
rles numeric_value threshold_reached max_streak_length
<int> <dbl> <lgl> <dbl>
1 1 10 FALSE 0
2 1 11 FALSE 0
3 2 21 TRUE 2
4 2 25 TRUE 2
5 3 15 FALSE 0
6 4 29 TRUE 3
7 4 30 TRUE 3
8 4 35 TRUE 3
9 5 9 FALSE 0
10 6 20 TRUE 4
11 6 21 TRUE 4
12 6 40 TRUE 4
13 6 22 TRUE 4
Upvotes: 1
Reputation: 40171
One option could be:
dat %>%
mutate(max_streak_length = with(rle(threshold_reached), rep(values * lengths, lengths)))
numeric_value threshold_reached max_streak_length
1 10 FALSE 0
2 11 FALSE 0
3 21 TRUE 2
4 25 TRUE 2
5 15 FALSE 0
6 29 TRUE 3
7 30 TRUE 3
8 35 TRUE 3
9 9 FALSE 0
10 20 TRUE 4
11 21 TRUE 4
12 40 TRUE 4
13 22 TRUE 4
Upvotes: 1