Reputation: 1640
I ran into an unexpected problem when trying to convert multiple columns of a data table into factor columns. I've reproduced it as follows:
library(data.table)
tst <- data.table('a' = c('b','b','c','c'))
class(tst[,a])
tst[,as.factor(a)] #Returns expected result
tst[,as.factor('a'),with=FALSE] #Returns error
The latter command returns 'Error in Math.factor(j) : abs not meaningful for factors'. I found this when attempting to get tst[,lapply(cols, as.factor),with=FALSE] where cols was a collection of rows I was attempting to convert to factors. Is there any solution or workaround for this?
Upvotes: 19
Views: 24884
Reputation: 118889
This is now fixed in v1.8.11, but probably not in the way you'd hoped for. From NEWS:
FR #4867 is now implemented.
DT[, as.factor('x'), with=FALSE]
wherex
is a column inDT
, is now equivalent toDT[, "x", with=FALSE]
instead of ending up with an error. Thanks to tresbot for reporting on SO: Converting multiple data.table columns to factors in R
Some explanation: The difference, when with=FALSE
is used, is that the columns of the data.table
aren't seen as variables anymore. That is:
tst[, as.factor(a), with=FALSE] # would give "a" not found!
would result in an error "a" not found
. But what you do instead is:
tst[, as.factor('a'), with=FALSE]
You're in fact creating a factor "a"
with level="a"
and asking to subset that column. This doesn't really make much sense. Take the case of data.frame
s:
DF <- data.frame(x=1:5, y=6:10)
DF[, c("x", "y")] # gives back DF
DF[, factor(c("x", "y"))] # gives back DF again, not factor columns
DF[, factor(c("x", "x"))] # gives back two columns of "x", still integer, not factor!
So, basically, what you're applying a factor on, when you use with=FALSE
is not on the elements of that column, but just that column name... I hope I've managed to convey the difference well. Feel free to edit/comment if there are any confusions.
Upvotes: 4
Reputation: 1640
I found one solution:
library(data.table)
tst <- data.table('a' = c('b','b','c','c'))
class(tst[,a])
cols <- 'a'
tst[,(cols):=lapply(.SD, as.factor),.SDcols=cols]
Still, the earlier-mentioned behavior seems buggy.
Upvotes: 37