larry77
larry77

Reputation: 1533

R: apply lagged calculation to all numeric variables in a tibble

Please have a look at the reprex at the end of the post. I slighly modified a sophisticated (for me!) function to calculated lagged variables and add them to an existing tibble while giving them a custom name. The ideas come from

https://purrple.cat/blog/2018/03/02/multiple-lags-with-tidy-evaluation/

However, when I try to apply this function to all the numeric variables in a tibble, I get an error message and I bang my head against the wall. Any suggestion is appreciated.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(rlang)
library(purrr)
#> 
#> Attaching package: 'purrr'
#> The following objects are masked from 'package:rlang':
#> 
#>     %@%, flatten, flatten_chr, flatten_dbl, flatten_int, flatten_lgl,
#>     flatten_raw, invoke, splice

df <- tibble(x=LETTERS[1:20], y=1:20, z=31:50)

df |> glimpse()
#> Rows: 20
#> Columns: 3
#> $ x <chr> "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N"…
#> $ y <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
#> $ z <int> 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, …


## See https://purrple.cat/blog/2018/03/02/multiple-lags-with-tidy-evaluation/


mylags <- function(data, variable, n=10){
  variable <- enquo(variable)
  
  indices <- seq_len(n)
  quosures <- map( indices, ~quo(lag(!!variable, !!.x)) ) %>% 
    set_names(sprintf("lag_%s_%02d", quo_text(variable), indices))
  
  mutate( data, !!!quosures )
  
}



df_lag <- df |>
    mylags(y,3)


df_lag |> glimpse()  ##this works as intended, but...
#> Rows: 20
#> Columns: 6
#> $ x        <chr> "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "…
#> $ y        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
#> $ z        <int> 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 4…
#> $ lag_y_01 <int> NA, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17…
#> $ lag_y_02 <int> NA, NA, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16…
#> $ lag_y_03 <int> NA, NA, NA, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15…


df_lag2 <- df |>
    mutate(across(where(is.numeric), ~mylags(.x,3))) ##but this fails
#> Error in `mutate()`:
#> ℹ In argument: `across(where(is.numeric), ~mylags(.x, 3))`.
#> Caused by error in `across()`:
#> ! Can't compute column `y`.
#> Caused by error in `UseMethod()`:
#> ! no applicable method for 'mutate' applied to an object of class "c('integer', 'numeric')"
## and I do not understand why.

rlang::last_trace()
#> Error: Can't show last error because no error was recorded yet

sessionInfo()
#> R version 4.3.3 (2024-02-29)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Debian GNU/Linux 12 (bookworm)
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.11.0 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
#>  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
#>  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Europe/Brussels
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] purrr_1.0.2 rlang_1.1.3 dplyr_1.1.4
#> 
#> loaded via a namespace (and not attached):
#>  [1] vctrs_0.6.5       cli_3.6.2         knitr_1.45        xfun_0.42        
#>  [5] styler_1.10.2     generics_0.1.3    glue_1.7.0        htmltools_0.5.7  
#>  [9] fansi_1.0.6       rmarkdown_2.25    R.cache_0.16.0    tibble_3.2.1     
#> [13] evaluate_0.23     fastmap_1.1.1     yaml_2.3.8        lifecycle_1.0.4  
#> [17] compiler_4.3.3    fs_1.6.3          pkgconfig_2.0.3   R.oo_1.26.0      
#> [21] R.utils_2.12.3    digest_0.6.34     R6_2.5.1          tidyselect_1.2.0 
#> [25] reprex_2.1.0      utf8_1.2.4        pillar_1.9.0      magrittr_2.0.3   
#> [29] R.methodsS3_1.8.2 tools_4.3.3       withr_3.0.0

Created on 2024-03-06 with reprex v2.1.0

Upvotes: 0

Views: 56

Answers (1)

I_O
I_O

Reputation: 6921

Custom names can be generated quite flexibly with the .names argument of across. An example combining the source column and the function applied:

library(dplyr)

iris |>
  select(starts_with('Sepal')) |>
  mutate(across(is.numeric,
                .fns = list(lag_3 = \(xs) lag(xs, 3),
                            lag_10 = \(xs) lag(xs, 10)
                            ),
                .names = "{.col}_{.fn}"
         )
         ) |>
  tail()
##     Sepal.Length Sepal.Width Sepal.Length_lag_3 Sepal.Length_lag_10
## 150          5.9           3                6.3                 6.9
##     Sepal.Width_lag_3 Sepal.Width_lag_10
## 150               2.5                3.1

If there's only one function to be applied (only source column varies), the expression reduces to, e. g.:

iris |>
  select(starts_with('Sepal')) |>
  mutate(across(is.numeric,
                ~ lag(.x, 3),
                .names = "{.col}_lag3"
         )
         ) |>
  tail(1)
##     Sepal.Length Sepal.Width Sepal.Length_lag3 Sepal.Width_lag3
## 150          5.9           3               6.3              2.5

Upvotes: 1

Related Questions