Reputation: 113
From what I can see here I would assume that data.table v1.8.0+ does not automatically convert strings to factors.
Specifically, to quote Matthew Dowle from that page:
No need for stringsAsFactors. Done like this in v1.8.0 : o character columns are now allowed in keys and are preferred to factor. data.table() and setkey() no longer coerce character to factor. Factors are still supported.
I'm not seeing that ... here's my R session transcript:
First, I make sure I have a recent enough version of data.table > 1.8.0
> library(data.table)
data.table 1.8.8 For help type: help("data.table")
Next, I create a 2x2 data.table. Notice that it creates factors ...
> m <- matrix(letters[1:4], ncol=2)
> str(data.table(m))
Classes ‘data.table’ and 'data.frame': 2 obs. of 2 variables:
$ V1: Factor w/ 2 levels "a","b": 1 2
$ V2: Factor w/ 2 levels "c","d": 1 2
- attr(*, ".internal.selfref")=<externalptr>
When I use stringsAsFactors in data.frame() and then call data.table(), all is well ...
> str(data.table(data.frame(m, stringsAsFactors=FALSE)))
Classes ‘data.table’ and 'data.frame': 2 obs. of 2 variables:
$ X1: chr "a" "b"
$ X2: chr "c" "d"
- attr(*, ".internal.selfref")=<externalptr>
What am I missing? Is data.frame() supposed to convert strings to factors, and if so, is there a "better way" of turning that behavior off?
Thanks!
Upvotes: 11
Views: 7556
Reputation: 115392
This issue seems to have slipped past somehow until now. Thanks to @fpinter for filing the issue recently. It is now fixed in commit 1322. From NEWS, No:39 under bug fixes for v1.9.3:
as.data.table.matrix
does not convert strings to factors by default.data.table
likes and prefers using character vectors to factors. Closes #745. Thanks to @fpinter for reporting the issue on the github issue tracker and to vijay for reporting here on SO.
It appears that this non-coercion is not yet implemented.
data.table
deals with matrix
arguments using as.data.table
if (is.matrix(xi) || is.data.frame(xi)) {
xi = as.data.table(xi, keep.rownames = keep.rownames)
x[[i]] = xi
numcols[i] = length(xi)
}
and
as.data.table.matrix
contains
if (mode(x) == "character") {
for (i in ic) value[[i]] <- as.factor(x[, i])
}
Might be worth reporting this to the bug tracker. (it is still implemented in 1.8.9, the current r-forge version)
Upvotes: 10
Reputation: 18437
As a workaround and to complete @mnel answer, if you want to turn off the default behavior of data.frame you can use the dedicated option.
options(stringsAsFactors=FALSE)
str(data.table(data.frame(m)))
Classes ‘data.table’ and 'data.frame': 2 obs. of 2 variables:
$ X1: chr "a" "b"
$ X2: chr "c" "d"
- attr(*, ".internal.selfref")=<externalptr>
Upvotes: 6