Reputation: 37
When I create a string column using data.table, using the data.frame parameter stringsAsFactor = F, the resulting data.table uses stringsAsFactor = F parameter correctly, but then the adds an extra column "stringsAsFactor". It is easy enough to get rid of the extra column. But is there a way to tell data.frame not to add columns based on the data.frame parameter? I.e., is this a bug or a feature? See ToyExample below:
library(data.table)
factorTest <- sample(c('O','A', 'B','AB'), 50, replace = T)
summary(factorTest)
Length Class Mode
50 character character
summary(as.factor(factorTest))
A AB B O
10 18 7 15
test1 <- data.frame(dabo = factor(factorTest,
levels = c('O','A','B','AB')), dabostr = factorTest,
stringsAsFactors = F)
test2 <- data.table(dabo = factor(factorTest,
levels = c('O','A','B','AB')), dabostr = factorTest,
stringsAsFactors = F)
summary(test1)
dabo dabostr
O :15 Length:50
A :10 Class :character
B : 7 Mode :character
AB:18
summary(test2)
dabo dabostr stringsAsFactors
O :15 Length:50 Mode :logical
A :10 Class :character FALSE:50
B : 7 Mode :character NA's :0
AB:18
Upvotes: 0
Views: 59
Reputation: 16697
This was fixed in commit 3dbc493 and now data.table()
has fully functional stringAsFactors
argument.
When TRUE it will use fast internal as.factor function, as the base factor()
is slow.
Below your code reproducible on latest data.table 1.9.7.
library(data.table)
factorTest <- sample(c('O','A', 'B','AB'), 50, replace = T)
test1 <- data.frame(dabo = factor(factorTest,
levels = c('O','A','B','AB')), dabostr = factorTest,
stringsAsFactors = F)
test2 <- data.table(dabo = factor(factorTest,
levels = c('O','A','B','AB')), dabostr = factorTest,
stringsAsFactors = F)
summary(test1)
# dabo dabostr
# O : 8 Length:50
# A :10 Class :character
# B :16 Mode :character
# AB:16
summary(test2)
# dabo dabostr
# O : 8 Length:50
# A :10 Class :character
# B :16 Mode :character
# AB:16
Upvotes: 1