Create lag/lead variable in dplyr with dynamic number of position

Question

I'm looking for a method to generate a column derived from a lag column with dynamic number of position (argument n), which means that this new column should take as argument n the value stored in another column (see lag function for documentation).

Sample data:

set.seed(42)
df <- as_tibble(data.frame(
  id = c(rep(1,6), rep(2,5), rep(3,6)),
  n_steps = c(0,0,0,0,1,2,0,0,1:3,0,1:5),
  var1 = sample(1:9, 17, replace = TRUE),
  var2 = runif(17, 1, 2)))

# A tibble: 17 x 4
      id n_steps  var1  var2
        
 1     1       0     1  1.08
 2     1       0     5  1.51
 3     1       0     1  1.39
 4     1       0     9  1.91
 5     1       1     4  1.45
 6     1       2     2  1.84
 7     2       0     1  1.74
 8     2       0     8  1.81
 9     2       1     7  1.39
10     2       2     4  1.69
11     2       3     9  1.00
12     3       0     5  1.83
13     3       1     4  1.01
14     3       2     2  1.21
15     3       3     3  1.91
16     3       4     9  1.61
17     3       5     9  1.38

More practically speaking, my idea is that I want to create a variable var3 with the following conditions:

if n_steps == 0, var3 = var1
if n_steps > 0, var3 = var1 + lag of var2 by n_steps steps

I tried the following code

df %>% 
  group_by(id) %>% 
  mutate(var3 = ifelse(n_steps == 0, var1, var1 + lag(var2, n = n_steps)))

which returns the following error

Error: Problem with mutate() input var3. n must be a nonnegative integer scalar, not a double vector of length 6. Input var3 is ifelse(n_steps == 0, var1, var1 + lag(var2, n = n_steps)). The error occured in group 1: id = 1. Run rlang::last_error() to see where the error occurred.

and I understand it: the argument n needs a scalar value. I thought of referring to the current row, but I didn't find anything about it on SO. Also, I could create N new variables for each value assumed by n_steps, but I was looking for a way to use these lag columns dynamically without actually creating them. How could I achieve it?

Expected output:

      id n_steps  var1  var2  var3
 1     1       0     1  1.08  1   
 2     1       0     5  1.51  5   
 3     1       0     1  1.39  1   
 4     1       0     9  1.91  9   
 5     1       1     4  1.45  5.91
 6     1       2     2  1.84  3.91
 7     2       0     1  1.74  1   
 8     2       0     8  1.81  8   
 9     2       1     7  1.39  8.81
10     2       2     4  1.69  5.81
11     2       3     9  1.00 10.81 
12     3       0     5  1.83  5   
13     3       1     4  1.01  5.83
14     3       2     2  1.21  3.83
15     3       3     3  1.91  4.83
16     3       4     9  1.61 10.83 
17     3       5     9  1.38 10.83

BellmanEqn · Accepted Answer

Try this?

df %>% 
  mutate(var3 = ifelse(n_steps == 0, var1, var1 + var2[row_number()-n_steps]))

Create lag/lead variable in dplyr with dynamic number of position

Answers (2)

Related Questions