Rads
Rads

Reputation: 355

Lagging in a data.table in R which is time series panel data

I was trying to use lag and data table notation to lag the variables, so that it is fastest. This is how I was trying to do it,

head(DT)
setkey(DT,code,year)

The output is

     code year pt_N_1y ws_country close is_msci
1: 130104 2003       0     ISRAEL     0       0     
2: 130104 2004       0     ISRAEL     0       0   
3: 130104 2005       0     ISRAEL     0       0    
4: 130104 2006       0     ISRAEL     0       0    
5: 130104 2007       0     ISRAEL     0       0    
6: 130104 2008       0     ISRAEL     0       0   
DT[,L1_is_msci:=.SD[lag(is_msci,1)],by=code]

This gives 50 warnings and gives all NA's. Isn't .SD supposed to subset the data by "code" and apply the function lag(is_msci, 1). I would ideally like a 1 line function to do the lags and would like to work with base functions and data table notation as it is the most optimal while dealing with huge datasets without installing many packages. Is it possible?

What I want to achieve is

     code year pt_N_1y ws_country close is_msci L1_is_msci
1: 130104 2003       0     ISRAEL     0       0    NA 
2: 130104 2004       0     ISRAEL     0       0    0
3: 130104 2005       0     ISRAEL     0       0    0
4: 130104 2006       0     ISRAEL     0       0    0
5: 130104 2007       0     ISRAEL     0       0    0
6: 130104 2008       0     ISRAEL     0       0    0

Upvotes: 1

Views: 401

Answers (1)

Arun
Arun

Reputation: 118789

In the future, please provide a reproducible example - one from which we can just copy/paste the code to get the data.

I'm not sure which function you're using for lag - you don't show it. The base function lag (from stats) seems to do something different. If you're using that, no wonder your results aren't what you expect. It's quite easy to test the function you're using:

# base function from package stats
lag(1:5, k=1L)
# [1] 1 2 3 4 5
# attr(,"tsp")
# [1] 0 4 1

Obviously not what we want. So, let's write our own function:

lag <- function(x, n)
    if (n == 0) x else c(rep.int(NA, n), head(x, -n))

# test it
lag(1:5, 0)
# [1] 1 2 3 4 5

lag(1:5, 2)
# [1] NA NA  1  2  3

Now, you an apply @eddi's code:

DT[, L1_is_msci := lag(is_msci, 1L), by = code]
#      code year pt_N_1y ws_country close is_msci L1_is_msci
# 1: 130104 2003       0     ISRAEL     0       0         NA
# 2: 130104 2004       0     ISRAEL     0       0          0
# 3: 130104 2005       0     ISRAEL     0       0          0
# 4: 130104 2006       0     ISRAEL     0       0          0
# 5: 130104 2007       0     ISRAEL     0       0          0
# 6: 130104 2008       0     ISRAEL     0       0          0

HTH

Upvotes: 1

Related Questions