Reputation: 5206
My issue is passing a data table to a set-up function, and then setting the key (with setkeyv
) in this function is lost when exiting the function.
Please can one of the DT gurus explain what I am doing wrong. I was expecting variable dt
to keep the key set within the function, as I thought the data tables stick to use by reference. What am I misunderstanding about scoping, and how do I fix it? (I am using data.table 1.9.6, R 3.2.2)
library(data.table)
# test data here for Stack overflow
# yes I know one can key the data.table here
# but normally read my data from csv file
dt <- data.table(date = c("2015-12-31","2016-01-01"),
class = c("a","b"),
units = c(1000, 200))
tables()
# no key as you expect as not set
#NAME NROW NCOL MB COLS KEY
#[1,] dt 2 3 1 date,class,units
Then I want to clean imported csv data in a function, and key the table.
PrepareData <- function(x.dt, date.col, key.col) {
# prepare unit price data table by keying on given columns and
# converting date columns to date class
require(data.table)
# convert dates if date.col not blank
if (!missing(date.col)) {
if (nchar(date.col[1]) > 1) {
for (j in date.col) {
set(x.dt, j=j ,
value = as.IDate(parse_date_time(x.dt[[j]], c("Ymd", "dmY"))))
# Since data.table likes integer based dates
}
}
}
# add key
if (!missing(key.col)) {
if (nchar(key.col[1]) > 1) {
setkeyv(x.dt, key.col)
}
}
# tables here shows a key is set
tables()
#NAME NROW NCOL MB COLS KEY
#[1,] x.dt 2 3 1 date,class,units date,class
return(x.dt)
}
But calling this function loses the key - my expectation was for key to be preserved when passed back.
my.key.cols <-c("date", "class") # key columns
dt <- PrepareData(dt, "date", my.key.cols)
tables()
#NAME NROW NCOL MB COLS KEY
#[1,] dt 2 3 1 date,class,units
#
# why did dt not keep the key? How should I fix this?
EDIT: SOLVED DT PACKAGE CAUSING THE ISSUE
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_2.0.0 DT_0.1 lubridate_1.5.0 data.table_1.9.6
loaded via a namespace (and not attached):
[1] Rcpp_0.12.2 digest_0.6.8 plyr_1.8.3 chron_2.3-47 grid_3.2.2 gtable_0.1.2 magrittr_1.5
[8] scales_0.3.0 stringi_1.0-1 tools_3.2.2 stringr_1.0.0 htmlwidgets_0.5 munsell_0.4.2 colorspace_1.2-6
[15] htmltools_0.3
Removing DT fixed this
Upvotes: 1
Views: 72
Reputation: 5206
Big Thanks to @David Arenburg for confirming it worked his side and pointing out the old package conflict chestnut!
Running sessionInfo()
showed I had DT
package loaded. This was conflicting with data.table
. Removing package fixed the error.
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_2.0.0 DT_0.1 lubridate_1.5.0 data.table_1.9.6
loaded via a namespace (and not attached):
[1] Rcpp_0.12.2 digest_0.6.8 plyr_1.8.3 chron_2.3-47 grid_3.2.2 gtable_0.1.2 magrittr_1.5
[8] scales_0.3.0 stringi_1.0-1 tools_3.2.2 stringr_1.0.0 htmlwidgets_0.5 munsell_0.4.2 colorspace_1.2-6
[15] htmltools_0.3
FIX
# specifying data.table explicitly with :: helped
data.table::setkeyv(x.dt, key.col)
I am leaving this solution in case others run into this conflict.
Upvotes: 1