Andrew
Andrew

Reputation: 688

A column of conditional lagged values in R with data.table

I have a data.table that looks like this:

df
     seller_id buyer_id       hs10 year mean_ln_c_hit
  1:         1        5 1200000000    5      4.407456
  2:         1       24 1500000000    2      4.173422
  3:         1        8 1500000000    3      4.532695
  4:         1        8 1500000000    5      4.830106
  5:         1       16 1500000000    5      4.830106
  6:         1       17 1800000000    3      3.920435
  7:         1       20 1800000000    5      4.357196
  8:         1       27 1800000000    5      4.357196
  9:         1        1 1900000000    1      4.835762
 10:         1        2 1900000000    1      4.835762
 11:         1        3 1900000000    1      4.835762
 12:         1        4 1900000000    1      4.835762
 13:         1        5 1900000000    1      4.835762
 14:         1        6 1900000000    1      4.835762
 15:         1        7 1900000000    1      4.835762
 16:         1        8 1900000000    1      4.835762
 17:         1        9 1900000000    1      4.835762
 18:         1       11 1900000000    1      4.835762
 19:         1       12 1900000000    1      4.835762
 20:         1       13 1900000000    1      4.835762
 21:         1       14 1900000000    1      4.835762
 22:         1       15 1900000000    1      4.835762
 23:         1       16 1900000000    1      4.835762
 24:         1       17 1900000000    1      4.835762
 25:         1       18 1900000000    1      4.835762
 26:         1       19 1900000000    1      4.835762
 27:         1       20 1900000000    1      4.835762
 28:         1       21 1900000000    1      4.835762
 29:         1       22 1900000000    1      4.835762
 30:         1       23 1900000000    1      4.835762
 31:         1       24 1900000000    1      4.835762
 32:         1       25 1900000000    1      4.835762
 33:         1       26 1900000000    1      4.835762
 34:         1       27 1900000000    1      4.835762
 35:         1       28 1900000000    1      4.835762
 36:         1       29 1900000000    1      4.835762
 37:         1       30 1900000000    1      4.835762
 38:         1        1 1900000000    2      4.409253
 39:         1        2 1900000000    2      4.409253
 40:         1        3 1900000000    2      4.409253
 41:         1        4 1900000000    2      4.409253
 42:         1        5 1900000000    2      4.409253
 43:         1        6 1900000000    2      4.409253
 44:         1        7 1900000000    2      4.409253
 45:         1        8 1900000000    2      4.409253
 46:         1        9 1900000000    2      4.409253
 47:         1       10 1900000000    2      4.409253
 48:         1       11 1900000000    2      4.409253
 49:         1       12 1900000000    2      4.409253
 50:         1       13 1900000000    2      4.409253
 51:         1       14 1900000000    2      4.409253
 52:         1       15 1900000000    2      4.409253
 53:         1       16 1900000000    2      4.409253
 54:         1       17 1900000000    2      4.409253
 55:         1       18 1900000000    2      4.409253
 56:         1       19 1900000000    2      4.409253
 57:         1       20 1900000000    2      4.409253
 58:         1       21 1900000000    2      4.409253
 59:         1       22 1900000000    2      4.409253
 60:         1       23 1900000000    2      4.409253
 61:         1       25 1900000000    2      4.409253
 62:         1       26 1900000000    2      4.409253
 63:         1       27 1900000000    2      4.409253
 64:         1       28 1900000000    2      4.409253
 65:         1       29 1900000000    2      4.409253
 66:         1       30 1900000000    2      4.409253
 67:         1        1 1900000000    3      4.514642
 68:         1        3 1900000000    3      4.514642
 69:         1        4 1900000000    3      4.514642
 70:         1        5 1900000000    3      4.514642
 71:         1        6 1900000000    3      4.514642
 72:         1        7 1900000000    3      4.514642
 73:         1        9 1900000000    3      4.514642
 74:         1       11 1900000000    3      4.514642
 75:         1       12 1900000000    3      4.514642
 76:         1       13 1900000000    3      4.514642
 77:         1       14 1900000000    3      4.514642
 78:         1       15 1900000000    3      4.514642
 79:         1       16 1900000000    3      4.514642
 80:         1       18 1900000000    3      4.514642
 81:         1       19 1900000000    3      4.514642
 82:         1       20 1900000000    3      4.514642
 83:         1       21 1900000000    3      4.514642
 84:         1       22 1900000000    3      4.514642
 85:         1       23 1900000000    3      4.514642
 86:         1       24 1900000000    3      4.514642
 87:         1       25 1900000000    3      4.514642
 88:         1       26 1900000000    3      4.514642
 89:         1       27 1900000000    3      4.514642
 90:         1       28 1900000000    3      4.514642
 91:         1       29 1900000000    3      4.514642
 92:         1       30 1900000000    3      4.514642
 93:         1        2 1900000000    5      4.698335
 94:         1        3 1900000000    5      4.698335
 95:         1        4 1900000000    5      4.698335
 96:         1        6 1900000000    5      4.698335
 97:         1        7 1900000000    5      4.698335
 98:         1        9 1900000000    5      4.698335
 99:         1       11 1900000000    5      4.698335
100:         1       12 1900000000    5      4.698335

I want to use data.table feature to create a new column called lag_mean_ln_c_hit which contains the one-year lag lag of the value in column mean_ln_c_hit, conditional on the seller_id and hs10. My best attempt is:

df[!is.na(year), lag_mean_ln_c_hit:= 
                        (.SD[.(seller_id = seller_id, hs10 = hs10, year = year - 1), mean_ln_c_hit, on = c("seller_id", "hs10", "year"), allow.cartesian = TRUE ]) ]

which kind of correctly pics when to display NAs (i.e., year = 1 is the first year, so the lag for all observations in the first year should be NA), but fails to capture the right lags. It always report 4.835762 for any other combination that is not NA (albeit once it reports 4.173422).

Each seller_id has multiple buyer_id entries with the same mean_ln_c_hit value. Therefore I have duplicates. Running the code, I get the following Warning as well

Warning message:
In `[.data.table`(df, !is.na(year), `:=`(lag_mean_ln_c_hit, (.SD[.(seller_id = seller_id,  :
  Supplied 1640 items to be assigned to 100 items of column 'lag_mean_ln_c_hit' (1540 unused)

The code would not run without allow.cartesian = TRUE.

Any idea on how to solve this issue? It should be pretty simple, but I can't get my head around it.

Upvotes: 0

Views: 49

Answers (1)

chinsoon12
chinsoon12

Reputation: 25225

No clear what is the desired output. If it has to be exactly one year ago, here is an option:

seller <- unique(df, by=c("seller_id", "hs10", "year"))
df[, oneyearago := year - 1L]
df[, lag_mean_ln_c_hit := seller[.SD, on=.(seller_id, hs10, year=oneyearago), x.mean_ln_c_hit]]

output:

     seller_id buyer_id       hs10 year mean_ln_c_hit oneyearago lag_mean_ln_c_hit
  1:         1        5 1200000000    5      4.407456          4                NA
  2:         1       24 1500000000    2      4.173422          1                NA
  3:         1        8 1500000000    3      4.532695          2          4.173422
  4:         1        8 1500000000    5      4.830106          4                NA
  5:         1       16 1500000000    5      4.830106          4                NA
  6:         1       17 1800000000    3      3.920435          2                NA
  7:         1       20 1800000000    5      4.357196          4                NA
  8:         1       27 1800000000    5      4.357196          4                NA
  9:         1        1 1900000000    1      4.835762          0                NA
 10:         1        2 1900000000    1      4.835762          0                NA
 11:         1        3 1900000000    1      4.835762          0                NA
 12:         1        4 1900000000    1      4.835762          0                NA
 13:         1        5 1900000000    1      4.835762          0                NA
 14:         1        6 1900000000    1      4.835762          0                NA
 15:         1        7 1900000000    1      4.835762          0                NA
 16:         1        8 1900000000    1      4.835762          0                NA
 17:         1        9 1900000000    1      4.835762          0                NA
 18:         1       11 1900000000    1      4.835762          0                NA
 19:         1       12 1900000000    1      4.835762          0                NA
 20:         1       13 1900000000    1      4.835762          0                NA
 21:         1       14 1900000000    1      4.835762          0                NA
 22:         1       15 1900000000    1      4.835762          0                NA
 23:         1       16 1900000000    1      4.835762          0                NA
 24:         1       17 1900000000    1      4.835762          0                NA
 25:         1       18 1900000000    1      4.835762          0                NA
 26:         1       19 1900000000    1      4.835762          0                NA
 27:         1       20 1900000000    1      4.835762          0                NA
 28:         1       21 1900000000    1      4.835762          0                NA
 29:         1       22 1900000000    1      4.835762          0                NA
 30:         1       23 1900000000    1      4.835762          0                NA
 31:         1       24 1900000000    1      4.835762          0                NA
 32:         1       25 1900000000    1      4.835762          0                NA
 33:         1       26 1900000000    1      4.835762          0                NA
 34:         1       27 1900000000    1      4.835762          0                NA
 35:         1       28 1900000000    1      4.835762          0                NA
 36:         1       29 1900000000    1      4.835762          0                NA
 37:         1       30 1900000000    1      4.835762          0                NA
 38:         1        1 1900000000    2      4.409253          1          4.835762
 39:         1        2 1900000000    2      4.409253          1          4.835762
 40:         1        3 1900000000    2      4.409253          1          4.835762
 41:         1        4 1900000000    2      4.409253          1          4.835762
 42:         1        5 1900000000    2      4.409253          1          4.835762
 43:         1        6 1900000000    2      4.409253          1          4.835762
 44:         1        7 1900000000    2      4.409253          1          4.835762
 45:         1        8 1900000000    2      4.409253          1          4.835762
 46:         1        9 1900000000    2      4.409253          1          4.835762
 47:         1       10 1900000000    2      4.409253          1          4.835762
 48:         1       11 1900000000    2      4.409253          1          4.835762
 49:         1       12 1900000000    2      4.409253          1          4.835762
 50:         1       13 1900000000    2      4.409253          1          4.835762
 51:         1       14 1900000000    2      4.409253          1          4.835762
 52:         1       15 1900000000    2      4.409253          1          4.835762
 53:         1       16 1900000000    2      4.409253          1          4.835762
 54:         1       17 1900000000    2      4.409253          1          4.835762
 55:         1       18 1900000000    2      4.409253          1          4.835762
 56:         1       19 1900000000    2      4.409253          1          4.835762
 57:         1       20 1900000000    2      4.409253          1          4.835762
 58:         1       21 1900000000    2      4.409253          1          4.835762
 59:         1       22 1900000000    2      4.409253          1          4.835762
 60:         1       23 1900000000    2      4.409253          1          4.835762
 61:         1       25 1900000000    2      4.409253          1          4.835762
 62:         1       26 1900000000    2      4.409253          1          4.835762
 63:         1       27 1900000000    2      4.409253          1          4.835762
 64:         1       28 1900000000    2      4.409253          1          4.835762
 65:         1       29 1900000000    2      4.409253          1          4.835762
 66:         1       30 1900000000    2      4.409253          1          4.835762
 67:         1        1 1900000000    3      4.514642          2          4.409253
 68:         1        3 1900000000    3      4.514642          2          4.409253
 69:         1        4 1900000000    3      4.514642          2          4.409253
 70:         1        5 1900000000    3      4.514642          2          4.409253
 71:         1        6 1900000000    3      4.514642          2          4.409253
 72:         1        7 1900000000    3      4.514642          2          4.409253
 73:         1        9 1900000000    3      4.514642          2          4.409253
 74:         1       11 1900000000    3      4.514642          2          4.409253
 75:         1       12 1900000000    3      4.514642          2          4.409253
 76:         1       13 1900000000    3      4.514642          2          4.409253
 77:         1       14 1900000000    3      4.514642          2          4.409253
 78:         1       15 1900000000    3      4.514642          2          4.409253
 79:         1       16 1900000000    3      4.514642          2          4.409253
 80:         1       18 1900000000    3      4.514642          2          4.409253
 81:         1       19 1900000000    3      4.514642          2          4.409253
 82:         1       20 1900000000    3      4.514642          2          4.409253
 83:         1       21 1900000000    3      4.514642          2          4.409253
 84:         1       22 1900000000    3      4.514642          2          4.409253
 85:         1       23 1900000000    3      4.514642          2          4.409253
 86:         1       24 1900000000    3      4.514642          2          4.409253
 87:         1       25 1900000000    3      4.514642          2          4.409253
 88:         1       26 1900000000    3      4.514642          2          4.409253
 89:         1       27 1900000000    3      4.514642          2          4.409253
 90:         1       28 1900000000    3      4.514642          2          4.409253
 91:         1       29 1900000000    3      4.514642          2          4.409253
 92:         1       30 1900000000    3      4.514642          2          4.409253
 93:         1        2 1900000000    5      4.698335          4                NA
 94:         1        3 1900000000    5      4.698335          4                NA
 95:         1        4 1900000000    5      4.698335          4                NA
 96:         1        6 1900000000    5      4.698335          4                NA
 97:         1        7 1900000000    5      4.698335          4                NA
 98:         1        9 1900000000    5      4.698335          4                NA
 99:         1       11 1900000000    5      4.698335          4                NA
100:         1       12 1900000000    5      4.698335          4                NA
     seller_id buyer_id       hs10 year mean_ln_c_hit oneyearago lag_mean_ln_c_hit

Or if it has be at least one year ago, you can use a non-equi join and pick the last match:

seller <- unique(df, by=c("seller_id", "hs10", "year"))
df[, lag_mean_ln_c_hit := seller[.SD, on=.(seller_id, hs10, year<year), x.mean_ln_c_hit, mult="last"]]

data:

library(data.table)
df <- fread("seller_id buyer_id       hs10 year mean_ln_c_hit
1        5 1200000000    5      4.407456
1       24 1500000000    2      4.173422
1        8 1500000000    3      4.532695
1        8 1500000000    5      4.830106
1       16 1500000000    5      4.830106
1       17 1800000000    3      3.920435
1       20 1800000000    5      4.357196
1       27 1800000000    5      4.357196
1        1 1900000000    1      4.835762
1        2 1900000000    1      4.835762
1        3 1900000000    1      4.835762
1        4 1900000000    1      4.835762
1        5 1900000000    1      4.835762
1        6 1900000000    1      4.835762
1        7 1900000000    1      4.835762
1        8 1900000000    1      4.835762
1        9 1900000000    1      4.835762
1       11 1900000000    1      4.835762
1       12 1900000000    1      4.835762
1       13 1900000000    1      4.835762
1       14 1900000000    1      4.835762
1       15 1900000000    1      4.835762
1       16 1900000000    1      4.835762
1       17 1900000000    1      4.835762
1       18 1900000000    1      4.835762
1       19 1900000000    1      4.835762
1       20 1900000000    1      4.835762
1       21 1900000000    1      4.835762
1       22 1900000000    1      4.835762
1       23 1900000000    1      4.835762
1       24 1900000000    1      4.835762
1       25 1900000000    1      4.835762
1       26 1900000000    1      4.835762
1       27 1900000000    1      4.835762
1       28 1900000000    1      4.835762
1       29 1900000000    1      4.835762
1       30 1900000000    1      4.835762
1        1 1900000000    2      4.409253
1        2 1900000000    2      4.409253
1        3 1900000000    2      4.409253
1        4 1900000000    2      4.409253
1        5 1900000000    2      4.409253
1        6 1900000000    2      4.409253
1        7 1900000000    2      4.409253
1        8 1900000000    2      4.409253
1        9 1900000000    2      4.409253
1       10 1900000000    2      4.409253
1       11 1900000000    2      4.409253
1       12 1900000000    2      4.409253
1       13 1900000000    2      4.409253
1       14 1900000000    2      4.409253
1       15 1900000000    2      4.409253
1       16 1900000000    2      4.409253
1       17 1900000000    2      4.409253
1       18 1900000000    2      4.409253
1       19 1900000000    2      4.409253
1       20 1900000000    2      4.409253
1       21 1900000000    2      4.409253
1       22 1900000000    2      4.409253
1       23 1900000000    2      4.409253
1       25 1900000000    2      4.409253
1       26 1900000000    2      4.409253
1       27 1900000000    2      4.409253
1       28 1900000000    2      4.409253
1       29 1900000000    2      4.409253
1       30 1900000000    2      4.409253
1        1 1900000000    3      4.514642
1        3 1900000000    3      4.514642
1        4 1900000000    3      4.514642
1        5 1900000000    3      4.514642
1        6 1900000000    3      4.514642
1        7 1900000000    3      4.514642
1        9 1900000000    3      4.514642
1       11 1900000000    3      4.514642
1       12 1900000000    3      4.514642
1       13 1900000000    3      4.514642
1       14 1900000000    3      4.514642
1       15 1900000000    3      4.514642
1       16 1900000000    3      4.514642
1       18 1900000000    3      4.514642
1       19 1900000000    3      4.514642
1       20 1900000000    3      4.514642
1       21 1900000000    3      4.514642
1       22 1900000000    3      4.514642
1       23 1900000000    3      4.514642
1       24 1900000000    3      4.514642
1       25 1900000000    3      4.514642
1       26 1900000000    3      4.514642
1       27 1900000000    3      4.514642
1       28 1900000000    3      4.514642
1       29 1900000000    3      4.514642
1       30 1900000000    3      4.514642
1        2 1900000000    5      4.698335
1        3 1900000000    5      4.698335
1        4 1900000000    5      4.698335
1        6 1900000000    5      4.698335
1        7 1900000000    5      4.698335
1        9 1900000000    5      4.698335
1       11 1900000000    5      4.698335
1       12 1900000000    5      4.698335")

Upvotes: 1

Related Questions