L. G. Hunsicker
L. G. Hunsicker

Reputation: 37

data.table with data.frame parameter creates extra column

When I create a string column using data.table, using the data.frame parameter stringsAsFactor = F, the resulting data.table uses stringsAsFactor = F parameter correctly, but then the adds an extra column "stringsAsFactor". It is easy enough to get rid of the extra column. But is there a way to tell data.frame not to add columns based on the data.frame parameter? I.e., is this a bug or a feature? See ToyExample below:

library(data.table)
factorTest <- sample(c('O','A', 'B','AB'), 50, replace = T)
summary(factorTest)
   Length     Class      Mode 
       50 character character 
summary(as.factor(factorTest))
 A AB  B  O 
10 18  7 15 
test1 <- data.frame(dabo = factor(factorTest, 
     levels = c('O','A','B','AB')), dabostr = factorTest, 
     stringsAsFactors = F)
test2 <- data.table(dabo = factor(factorTest, 
     levels = c('O','A','B','AB')), dabostr = factorTest, 
     stringsAsFactors = F)
summary(test1)
 dabo      dabostr         
 O :15   Length:50         
 A :10   Class :character  
 B : 7   Mode  :character  
 AB:18                     
summary(test2)
 dabo      dabostr          stringsAsFactors
 O :15   Length:50          Mode :logical   
 A :10   Class :character   FALSE:50        
 B : 7   Mode  :character   NA's :0         
 AB:18                    

Upvotes: 0

Views: 59

Answers (1)

jangorecki
jangorecki

Reputation: 16697

This was fixed in commit 3dbc493 and now data.table() has fully functional stringAsFactors argument.
When TRUE it will use fast internal as.factor function, as the base factor() is slow.
Below your code reproducible on latest data.table 1.9.7.

library(data.table)
factorTest <- sample(c('O','A', 'B','AB'), 50, replace = T)
test1 <- data.frame(dabo = factor(factorTest, 
     levels = c('O','A','B','AB')), dabostr = factorTest, 
     stringsAsFactors = F)
test2 <- data.table(dabo = factor(factorTest, 
     levels = c('O','A','B','AB')), dabostr = factorTest, 
     stringsAsFactors = F)
summary(test1)
# dabo      dabostr         
# O : 8   Length:50         
# A :10   Class :character  
# B :16   Mode  :character  
# AB:16                                   
summary(test2)
# dabo      dabostr         
# O : 8   Length:50         
# A :10   Class :character  
# B :16   Mode  :character  
# AB:16         

Upvotes: 1

Related Questions