Calculate multiple lags for each variable in a df and store the results into nested lists

Question

I have a df (data) that I want to pass as an argument to a function fun.lag_cols to calculate (for each column in df) several lags. The results must be stored in a nested list, but my function seems to be missing (at least) one step.

data <- data.frame(x1 = rnorm(10,0,1)
               , x2 = rnorm(10,2,3)
               , x3 = rnorm(10,6,1))

fun.lag_cols <- function(x, lag_from = 0, lag_to = 2) {
  x <- as.data.frame(x)
  cols_x <- ncol(x)
  lst_lag <- list()
  
  for (i in 1:cols_x) {
    for(j in lag_from:lag_to) {
      lst_lag[[i]] <- dplyr::lag(x[,i],j)
    }
    
  }
  return(lst_lag)
}

output <- fun.lag_cols(data)

In this particular example, I would like to see output as a list of 3 elements (x1, x2, x3), each element a new list of 3 (one per lag 0, 1, 2).

My code seems to store only lag2 (in general, the maximum lag) for each variable, clearly not the expected result.

I am open to different approaches, as long as they provide the final output (nested list).

Thanks

akrun · Accepted Answer

We could change the assignment of the 'lst_lag[[i]]' by concatenating the element with the lag value inside the nested loop. In the function, there are two changes - 1) initialize an output list with predefined length (vector('list', ncol(x))), 2) inside the nested loop, where we append those ith list elements with new child list elements by concatenating the already existing list with the new list created by wrapping the lag inside a list, while recursively updating the same list element (<-)

fun.lag_cols <- function(x, lag_from = 0, lag_to = 2) {
  x <- as.data.frame(x)
  cols_x <- ncol(x)
  lst_lag <- vector('list', ncol(x))
  
  for (i in 1:cols_x) {
    for(j in lag_from:lag_to) {
      lst_lag[[i]] <- c(lst_lag[[i]], list(dplyr::lag(x[,i],j)))
    }
    
  }
  return(lst_lag)
}

-testing

fun.lag_cols(data)
[[1]]
[[1]][[1]]
 [1] -1.40431393 -2.22551238  0.06090537  0.77941726  1.10733091  1.20657717  0.71614034 -0.17990135  0.22058894  0.33598415

[[1]][[2]]
 [1]          NA -1.40431393 -2.22551238  0.06090537  0.77941726  1.10733091  1.20657717  0.71614034 -0.17990135  0.22058894

[[1]][[3]]
 [1]          NA          NA -1.40431393 -2.22551238  0.06090537  0.77941726  1.10733091  1.20657717  0.71614034 -0.17990135


[[2]]
[[2]][[1]]
 [1]  1.1334651  1.2385579  1.8930347 -4.7379766  2.0169352  0.7210822 -1.0322536  4.5446643  1.4421923  1.1316508

[[2]][[2]]
 [1]         NA  1.1334651  1.2385579  1.8930347 -4.7379766  2.0169352  0.7210822 -1.0322536  4.5446643  1.4421923

[[2]][[3]]
 [1]         NA         NA  1.1334651  1.2385579  1.8930347 -4.7379766  2.0169352  0.7210822 -1.0322536  4.5446643


[[3]]
[[3]][[1]]
 [1] 4.324912 5.114774 4.517017 7.001338 5.218430 4.408571 7.233504 6.875883 5.848294 4.696724

[[3]][[2]]
 [1]       NA 4.324912 5.114774 4.517017 7.001338 5.218430 4.408571 7.233504 6.875883 5.848294

[[3]][[3]]
 [1]       NA       NA 4.324912 5.114774 4.517017 7.001338 5.218430 4.408571 7.233504 6.875883

There is already a function available to do this i.e. shift from data.table which take a vectorized n

library(data.table)
shift(data, n = 0:2)

Calculate multiple lags for each variable in a df and store the results into nested lists

Answers (2)

Related Questions