Michael Luu
Michael Luu

Reputation: 609

Create new variables with lag data from all current variables

My dataset has about 20 columns and I would like to create 7 new columns with lagged data for each of the 20 current columns.

For example I have column x, y, and z. I would like to create a columns for xlag1, xlag2, xlag3, xlag4, xlag5, xlag6, xlag7, ylag1, ylag2, etc..

My current attempt is with dplyr in R -

aq %>% mutate(., 
          xlag1 = lag(x, 1),
          xlag2 = lag(x, 2),
          xlag3 = lag(x, 3),
          xlag4 = lag(x, 4),
          xlag5 = lag(x, 5),
          xlag6 = lag(x, 6),
          xlag7 = lag(x, 7),
          )

As you can see it'll take alot of lines of codes to cover all 20 columns. Is there a more efficient way of doing this ? If possible in dplyr and R as I'm most familiar with the package.

Upvotes: 1

Views: 526

Answers (1)

akrun
akrun

Reputation: 887911

We can use data.table. The shift from data.table can take a sequence of 'n'.

library(data.table)
setDT(aq)[, paste0('xlag', 1:7) := shift(x, 1:7)]

If there are multiple columns,

setDT(aq)[, paste0(rep(c("xlag", "ylag"), each = 7), 1:7) := 
                 c(shift(x, 1:7), shift(y, 1:7))]

If we have many columns, then specify the columns in .SDcols and loop through the dataset, get the shift, unlist and assign to new columns

setDT(aq)[, paste0(rep(c("xlag", "ylag"), each = 7), 1:7) := 
   unlist(lapply(.SD, shift, n = 1:7), recursive = FALSE) , .SDcols = x:y]

We can also use the shift in dplyr

library(dplyr)
aq %>% 
    do(setNames(data.frame(., shift(.$x, 1:7)), c(names(aq), paste0('xlag', 1:7))))

and for multiple columns

aq %>%
  do(setNames(data.frame(., shift(.$x, 1:7), shift(.$y, 1:7)), 
         c(names(aq), paste0(rep(c("xlag", "ylag"), each = 7), 1:7) )))

data

aq <- data.frame(x = 1:20, y  = 21:40)

Upvotes: 2

Related Questions