Reputation: 1699
I'm finding some sharp edges regarding specific column names in data.table. How can I avoid cutting myself on them? Assume I have a data.table with two columns, 'type' and 'value'.
numRows = 100
numTypes = 10
dt = data.table(type=sample(numTypes, numRows, replace=T),
value=rnorm(numRows))
If I want to see quickly calculate the mean for all rows with type==3, this works great:
dt[type==3, mean(value)]
# [1] 0.08086124
But what if "someone who is not me" came along and decided that 'type' is a poor name for the column, and it is really should be a 'class'?
setnames(dt, "type", "class")
Now when I try the equivalent operation I get scary error messages:
dt[class==3, mean(value)]
# Error in setattr(attr(x, "index"), paste(cols, collapse = "__"), o) :
# attempt to set invalid 'class' attribute
I this expected behavior (for 1.9.4 on OSX)? I presume it happens because 'class' is a function name in R, and something internal to data.table is interpreting it as such. Wrapping the i clause in parentheses seems to solve the problem:
dt[(class==3), mean(value)]
# [1] 0.08086124
But maybe there are cases where this workaround fails too?
Is there a list of column names that are expected to fail in this case?
Can user defined functions or loaded libraries cause the same error?
Is there in general a safer way to do this that I should be using?
Upvotes: 1
Views: 561
Reputation: 16697
This seems to be already fixed. Update your data.table package.
library(data.table)
set.seed(1)
numRows = 100
numTypes = 10
dt = data.table(type=sample(numTypes, numRows, replace=T),
value=rnorm(numRows))
setnames(dt,"type","class")
dt[class==3, mean(value)]
# [1] -0.2300146
Upvotes: 2