two questions about creating lags within groups and with respect to time

Question

I have a data frame x like this:

ID year    month     vol   sum_vol        
1   2000     1        1       6                 
1   2000     2        2       6                  
1   2000     3        3       6                  
1   2001     3        4       25                  
1   2001     4        5       25                  
1   2001     5        16      25                  
2   2000     1        7       24                
2   2000     2        8       24                 
2   2000     3        9       24                
2   2001     3        12      35                 
2   2001     4        11      35                 
2   2001     5        12      35                 
3   2000     1        13      42                 
3   2000     2        14      42                 
3   2000     3        15      42                 
3   2001     6        16      44          
3   2001     7        10      44
3   2001     8        18      44

And the desired output:

ID year    month     vol   sum_vol        lag_year_sum_vol      lag_2_month_vol
1   2000     1        1       6                  NA                    NA
1   2000     2        2       6                  NA                    NA
1   2000     3        3       6                  NA                    1
1   2001     3        4       25                  6                    NA
1   2001     4        5       25                  6                    NA
1   2001     5        16      25                  6                    4
2   2000     1        7       24                 NA                    NA
2   2000     2        8       24                 NA                    NA
2   2000     3        9       24                 NA                    7
2   2001     3        12      35                 24                    NA
2   2001     4        11      35                 24                    NA
2   2001     5        12      35                 24                    12
3   2000     1        13      42                 NA                    NA
3   2000     2        14      42                 NA                    NA
3   2000     3        15      42                 NA                    13
3   2001     6        16      44                 42                    NA
3   2001     7        10      44                 42                    NA
3   2001     8        18      44                 42                    16

I have searched a lot and not getting a conclusion:

So as you can see my questions are:

1) How to create a variable lag_year_sum_vol for one year per ID, which has a value of last year's sum_vol?

2) How to have lagged new variables lag_2_month_vol per customer per year, by 2 months?

NOTE: The ID, year and month in the real data may not be in this sorted order. And actually for any arbitrary number for year, month, and vol -> no pattern in the data.

I prefer a method in either dplyr or data.table. (It seems data.table is more concise.)

Thanks in advance!!

talat · Accepted Answer

Here's an option using data.table:

library(data.table)

# column 1
dt[dt[, .(ID, year = year +1, sum_vol)], on = .(ID, year), 
   lag_year_sum_vol := i.sum_vol]

# column 2
dt[dt[, .(ID, year, month = month+2, vol)], on = .(ID, year, month),
   lag_2_month_vol := i.vol]

As you can see, I temporarily modified the data, joined and updated the original data in both cases. There are, of course, other ways to do this using data.table.

The result is:

    ID year month vol sum_vol lag_year_sum_vol lag_2_month_vol
 1:  1 2000     1   1       6               NA              NA
 2:  1 2000     2   2       6               NA              NA
 3:  1 2000     3   3       6               NA               1
 4:  1 2001     3   4      25                6              NA
 5:  1 2001     4   5      25                6              NA
 6:  1 2001     5  16      25                6               4
 7:  2 2000     1   7      24               NA              NA
 8:  2 2000     2   8      24               NA              NA
 9:  2 2000     3   9      24               NA               7
10:  2 2001     3  12      35               24              NA
11:  2 2001     4  11      35               24              NA
12:  2 2001     5  12      35               24              12
13:  3 2000     1  13      42               NA              NA
14:  3 2000     2  14      42               NA              NA
15:  3 2000     3  15      42               NA              13
16:  3 2001     6  16      44               42              NA
17:  3 2001     7  10      44               42              NA
18:  3 2001     8  18      44               42              16

two questions about creating lags within groups and with respect to time

Answers (2)

Related Questions