jayb
jayb

Reputation: 565

Assign value of maximum streak length to all rows in a streak

I have a data.frame like this:

> numeric_value <- c(10, 11, 21, 25, 15, 29, 30, 35, 9, 20, 21, 40, 22)
> threshold_reached <- c(F, F, T, T, F, T, T, T, F, T, T ,T, T)
> dat <- data.frame(numeric_value, threshold_reached)

I want a new variable for the maximum streak length of each streak (a streak being a TRUE value of threshold_reached), like this:

> max_streak_length <- c(0, 0, 2, 2, 0, 3, 3, 3, 0, 4, 4, 4, 4)
> data.frame(numeric_value, threshold_reached, max_streak_length)
   numeric_value threshold_reached max_streak_length
1             10             FALSE                 0
2             11             FALSE                 0
3             21              TRUE                 2
4             25              TRUE                 2
5             15             FALSE                 0
6             29              TRUE                 3
7             30              TRUE                 3
8             35              TRUE                 3
9              9             FALSE                 0
10            20              TRUE                 4
11            21              TRUE                 4
12            40              TRUE                 4
13            22              TRUE                 4

There are a few similar questions like this one and this one, which use the runner package or rle package. But I haven't found one that answers this specific problem, and I can't see a solution myself.

Preferably, I would like an answer using dplyr::mutate but this isn't essential.

Thanks!

Upvotes: 2

Views: 101

Answers (2)

Anoushiravan R
Anoushiravan R

Reputation: 21938

I think you can use the following solution:

library(dplyr)
library(data.table)

dat %>%
  mutate(rles = rleid(threshold_reached)) %>%
  group_by(rles) %>%
  mutate(max_streak_length = ifelse(!threshold_reached, 0, n())) %>%
  select(-rles)

# A tibble: 13 x 4
# Groups:   rles [6]
    rles numeric_value threshold_reached max_streak_length
   <int>         <dbl> <lgl>                         <dbl>
 1     1            10 FALSE                             0
 2     1            11 FALSE                             0
 3     2            21 TRUE                              2
 4     2            25 TRUE                              2
 5     3            15 FALSE                             0
 6     4            29 TRUE                              3
 7     4            30 TRUE                              3
 8     4            35 TRUE                              3
 9     5             9 FALSE                             0
10     6            20 TRUE                              4
11     6            21 TRUE                              4
12     6            40 TRUE                              4
13     6            22 TRUE                              4

Upvotes: 1

tmfmnk
tmfmnk

Reputation: 40171

One option could be:

dat %>%
 mutate(max_streak_length = with(rle(threshold_reached), rep(values * lengths, lengths)))

   numeric_value threshold_reached max_streak_length
1             10             FALSE                 0
2             11             FALSE                 0
3             21              TRUE                 2
4             25              TRUE                 2
5             15             FALSE                 0
6             29              TRUE                 3
7             30              TRUE                 3
8             35              TRUE                 3
9              9             FALSE                 0
10            20              TRUE                 4
11            21              TRUE                 4
12            40              TRUE                 4
13            22              TRUE                 4

Upvotes: 1

Related Questions