Oscar
Oscar

Reputation: 131

Duplicate Vector in Rolling Manner in R

Say I want to run a regression whereby the data input for the DV should be taken in a rolling manner. To ease this process, I therefore first would like to "duplicate" that vector whereby I roll the observations accordingly. As example, see the data structure below.

# libraries #
library(dplyr)

# reproducible data # 
df1 <- tibble(ID = as.factor(rep(c(1, 2), each = 40)),
              YEAR = rep(rep(c(2001:2010), each = 4), 2),
              QTR = rep(c(1:4), 20),
              DV = rnorm(80))

df2 <- tibble(ID = as.factor(rep(c(1, 2), each = 120)),
              YEAR = rep(rep(c(2005:2010), each = 20), 2),
              IV = rnorm(240))

The reason the data is structured like this is because the data in df2 are residuals from earlier executed regressions that likewise used "rolling" data.

The aim is then to run a model whereby the observations in df1 are "rolled":

The way I approached this problem is by trying to "duplicate" df1 in a rolling manner such that the regression is easier to execute.

For the moment I tried rolling it through the function embed() in the base package, but that becomes a mess very quickly as my real dataset is much larger. Would there be an elegent dplyr alternative?

Thanks!

Upvotes: 0

Views: 35

Answers (1)

akrun
akrun

Reputation: 886938

We could use

v1 <- c(1, seq(5, nrow(df1), by = 4))
v2 <- seq(20, nrow(df1), by = 4)
i1 <- seq_len(min(c(length(v1), length(v2))))
lst1 <- map2(v1[i1], v2[i1], ~ df1 %>% 
                           slice(.x:.y))

Similarly, do this with 'df2'

v11 <- seq(1, nrow(df2), by = 20)
v22 <- seq(20, nrow(df2), by = 20)
i2 <- seq_len(min(c(length(v11), length(v22))))
lst2 <- map2(v11[i2], v22[i2], ~ df2 %>% 
                           slice(.x:.y))

and then use map2 to apply functions on corresponding elements of both lists

Update

As the OP mentioned about grouping by 'ID', one option is group_split by 'ID' and then use the same steps as above

df1 %>%
    group_split(ID) %>% 
    map(~ {
      v1 <- c(1, seq(5, nrow(.x), by = 4))
      v2 <- seq(20, nrow(.), by = 4)
      i1 <- seq_len(min(c(length(v1), length(v2))))
      map2(v1[i1], v2[i1], function(x, y) .x %>%
           slice(x:y))
   })

Upvotes: 1

Related Questions