Why does data.table change the key after a transformation in j?

Question

I have a data.table with a two-column key (id, date) and one or more columns of data. Some of the data might have missing values so I am using na.locf() from zoo to fill it in. I have noticed this operation changes the key in my data.table and I need to re-key it for subsequent joins. Why is this happening and in what other situations can I expect such behavior?

You can use the code below to reproduce the issue.

Thanks!

require(zoo)
d <- data.table(id = rep(1:2, each = 5), date = rep(1:5, 2), value = c(1,2,NA,NA,NA, 6,7,8,9,10))
setkey(d, id, date)
x <- d[, lapply(.SD, na.locf, na.rm = FALSE, maxgap = 1), by = 'id']

key(d)
key(x)

BrodieG · Accepted Answer

I think this does what you want:

x <- copy(d)
x[, (3:length(x)) := lapply(.SD, na.locf, maxgap = 1), by = 'id', .SDcols=3:length(x)]
key(x)

Results in:

[1] "id"   "date"

And x:

    id date value
 1:  1    1     1
 2:  1    2     2
 3:  1    3     1
 4:  1    4     2
 5:  1    5     1
 6:  2    1     6
 7:  2    2     7
 8:  2    3     8
 9:  2    4     9
10:  2    5    10

This assumes you don't need na.locf to be applied on the date column. Since you're not changing that column using := on the other columns preserves the key on the table.

Also, I had to change your use of na.locf na.rm to the default as otherwise that doesn't do anything.

Why does data.table change the key after a transformation in j?

Answers (1)

Related Questions