Reputation: 4416
I have a scenario where I want to label the most recent seq of numbers. The only thing I know about the seq is that it is ascending. When it stops doing that, I'd like a label. But I only need a label for the most recent sequence of ascending numbers. So here is an example of the data I have:
library(tibble)
n <- 5
x <- tibble(
date = seq.Date(as.Date("2022-10-05"), Sys.Date(), by = "days"),
split_seq = c(seq(1, 10, length.out = n), seq(1, 10, length.out = length(date) - n))
)
x
#> # A tibble: 13 × 2
#> date split_seq
#> <date> <dbl>
#> 1 2022-10-05 1
#> 2 2022-10-06 3.25
#> 3 2022-10-07 5.5
#> 4 2022-10-08 7.75
#> 5 2022-10-09 10
#> 6 2022-10-10 1
#> 7 2022-10-11 2.29
#> 8 2022-10-12 3.57
#> 9 2022-10-13 4.86
#> 10 2022-10-14 6.14
#> 11 2022-10-15 7.43
#> 12 2022-10-16 8.71
#> 13 2022-10-17 10
This is an example of what I'd like with the label_col
added in:
#> # A tibble: 13 × 3
#> date split_seq label_col
#> <date> <dbl> <chr>
#> 1 2022-10-05 1 not last seq label
#> 2 2022-10-06 3.25 not last seq label
#> 3 2022-10-07 5.5 not last seq label
#> 4 2022-10-08 7.75 not last seq label
#> 5 2022-10-09 10 not last seq label
#> 6 2022-10-10 1 last seq label
#> 7 2022-10-11 2.29 last seq label
#> 8 2022-10-12 3.57 last seq label
#> 9 2022-10-13 4.86 last seq label
#> 10 2022-10-14 6.14 last seq label
#> 11 2022-10-15 7.43 last seq label
#> 12 2022-10-16 8.71 last seq label
#> 13 2022-10-17 10 last seq label
A dplyr
specific answer would be nice though not required.
Upvotes: 1
Views: 59
Reputation: 18581
My approach is similar, but has less intermediate variables:
library(dplyr)
x %>%
mutate(col_label = cumsum(!split_seq > lag(split_seq, default = first(split_seq))) %>%
{ ifelse(. == max(.),
"last seq label",
"not last seq label") }
)
#> # A tibble: 13 × 3
#> date split_seq col_label
#> <date> <dbl> <chr>
#> 1 2022-10-05 1 not last seq label
#> 2 2022-10-06 3.25 not last seq label
#> 3 2022-10-07 5.5 not last seq label
#> 4 2022-10-08 7.75 not last seq label
#> 5 2022-10-09 10 not last seq label
#> 6 2022-10-10 1 last seq label
#> 7 2022-10-11 2.29 last seq label
#> 8 2022-10-12 3.57 last seq label
#> 9 2022-10-13 4.86 last seq label
#> 10 2022-10-14 6.14 last seq label
#> 11 2022-10-15 7.43 last seq label
#> 12 2022-10-16 8.71 last seq label
#> 13 2022-10-17 10 last seq label
Created on 2022-10-17 with reprex v2.0.2
Upvotes: 0
Reputation: 2758
Something like this?
library(dplyr, warn.conflicts = FALSE)
x |>
mutate(
is_increase = lag(split_seq) < split_seq,
is_first_seq = !as.logical(cumsum(!coalesce(is_increase, TRUE))),
label_col = if_else(
is_first_seq,
"not last seq label",
"last seq label"
)
) |>
select(-c(is_increase, is_first_seq))
#> # A tibble: 13 × 3
#> date split_seq label_col
#> <date> <dbl> <chr>
#> 1 2022-10-05 1 not last seq label
#> 2 2022-10-06 3.25 not last seq label
#> 3 2022-10-07 5.5 not last seq label
#> 4 2022-10-08 7.75 not last seq label
#> 5 2022-10-09 10 not last seq label
#> 6 2022-10-10 1 last seq label
#> 7 2022-10-11 2.29 last seq label
#> 8 2022-10-12 3.57 last seq label
#> 9 2022-10-13 4.86 last seq label
#> 10 2022-10-14 6.14 last seq label
#> 11 2022-10-15 7.43 last seq label
#> 12 2022-10-16 8.71 last seq label
#> 13 2022-10-17 10 last seq label
Created on 2022-10-17 with reprex v2.0.2
Note that it might make for clearer code if you abstract the above logic into a function. E.g. something like this:
index_sequences <- function(x) {
cumsum(coalesce(lag(x) > x, 1))
}
index_sequences(c(1:4, 3:5, 1:3))
#> [1] 1 1 1 1 2 2 2 3 3 3
Created on 2022-10-17 with reprex v2.0.2
Upvotes: 1