boshek
boshek

Reputation: 4416

label latest sequence of ascending numbers

I have a scenario where I want to label the most recent seq of numbers. The only thing I know about the seq is that it is ascending. When it stops doing that, I'd like a label. But I only need a label for the most recent sequence of ascending numbers. So here is an example of the data I have:

library(tibble)

n <- 5

x <- tibble(
  date = seq.Date(as.Date("2022-10-05"), Sys.Date(), by = "days"),
  split_seq = c(seq(1, 10, length.out = n), seq(1, 10, length.out = length(date) - n))
)

x
#> # A tibble: 13 × 2
#>    date       split_seq
#>    <date>         <dbl>
#>  1 2022-10-05      1   
#>  2 2022-10-06      3.25
#>  3 2022-10-07      5.5 
#>  4 2022-10-08      7.75
#>  5 2022-10-09     10   
#>  6 2022-10-10      1   
#>  7 2022-10-11      2.29
#>  8 2022-10-12      3.57
#>  9 2022-10-13      4.86
#> 10 2022-10-14      6.14
#> 11 2022-10-15      7.43
#> 12 2022-10-16      8.71
#> 13 2022-10-17     10

This is an example of what I'd like with the label_col added in:

#> # A tibble: 13 × 3
#>    date       split_seq label_col         
#>    <date>         <dbl> <chr>             
#>  1 2022-10-05      1    not last seq label
#>  2 2022-10-06      3.25 not last seq label
#>  3 2022-10-07      5.5  not last seq label
#>  4 2022-10-08      7.75 not last seq label
#>  5 2022-10-09     10    not last seq label
#>  6 2022-10-10      1    last seq label    
#>  7 2022-10-11      2.29 last seq label    
#>  8 2022-10-12      3.57 last seq label    
#>  9 2022-10-13      4.86 last seq label    
#> 10 2022-10-14      6.14 last seq label    
#> 11 2022-10-15      7.43 last seq label    
#> 12 2022-10-16      8.71 last seq label    
#> 13 2022-10-17     10    last seq label

A dplyr specific answer would be nice though not required.

Upvotes: 1

Views: 59

Answers (2)

TimTeaFan
TimTeaFan

Reputation: 18581

My approach is similar, but has less intermediate variables:

library(dplyr)

x %>% 
  mutate(col_label = cumsum(!split_seq > lag(split_seq, default = first(split_seq))) %>% 
           { ifelse(. == max(.),
                  "last seq label",
                  "not last seq label") }
         )

#> # A tibble: 13 × 3
#>    date       split_seq col_label         
#>    <date>         <dbl> <chr>             
#>  1 2022-10-05      1    not last seq label
#>  2 2022-10-06      3.25 not last seq label
#>  3 2022-10-07      5.5  not last seq label
#>  4 2022-10-08      7.75 not last seq label
#>  5 2022-10-09     10    not last seq label
#>  6 2022-10-10      1    last seq label    
#>  7 2022-10-11      2.29 last seq label    
#>  8 2022-10-12      3.57 last seq label    
#>  9 2022-10-13      4.86 last seq label    
#> 10 2022-10-14      6.14 last seq label    
#> 11 2022-10-15      7.43 last seq label    
#> 12 2022-10-16      8.71 last seq label    
#> 13 2022-10-17     10    last seq label

Created on 2022-10-17 with reprex v2.0.2

Upvotes: 0

wurli
wurli

Reputation: 2758

Something like this?

library(dplyr, warn.conflicts = FALSE)

x |> 
  mutate(
    is_increase = lag(split_seq) < split_seq,
    is_first_seq = !as.logical(cumsum(!coalesce(is_increase, TRUE))),
    label_col = if_else(
      is_first_seq,
      "not last seq label",
      "last seq label"
    )
  ) |> 
  select(-c(is_increase, is_first_seq))
#> # A tibble: 13 × 3
#>    date       split_seq label_col         
#>    <date>         <dbl> <chr>             
#>  1 2022-10-05      1    not last seq label
#>  2 2022-10-06      3.25 not last seq label
#>  3 2022-10-07      5.5  not last seq label
#>  4 2022-10-08      7.75 not last seq label
#>  5 2022-10-09     10    not last seq label
#>  6 2022-10-10      1    last seq label    
#>  7 2022-10-11      2.29 last seq label    
#>  8 2022-10-12      3.57 last seq label    
#>  9 2022-10-13      4.86 last seq label    
#> 10 2022-10-14      6.14 last seq label    
#> 11 2022-10-15      7.43 last seq label    
#> 12 2022-10-16      8.71 last seq label    
#> 13 2022-10-17     10    last seq label

Created on 2022-10-17 with reprex v2.0.2

Note that it might make for clearer code if you abstract the above logic into a function. E.g. something like this:

index_sequences <- function(x) {
  cumsum(coalesce(lag(x) > x, 1)) 
}

index_sequences(c(1:4, 3:5, 1:3))
#>  [1] 1 1 1 1 2 2 2 3 3 3

Created on 2022-10-17 with reprex v2.0.2

Upvotes: 1

Related Questions