C8H10N4O2
C8H10N4O2

Reputation: 18995

R - set reference level of factor to NA

I have a data.table with factor columns where some values are NA. I have deliberately included NA as a level of the factors (i.e., x <- factor(x, exclude=NULL), rather than the default behavior of x <- factor(x, exclude=NA)) because the NAs are meaningful for my model. For these factor columns, I wish to relevel() the reference level to NA, but I am struggling with the syntax.

# silly reproducible example
library(data.table)
a <- data.table(animal = c("turkey","platypus","dolphin"),
            mass_kg = c(8, 2, 200),
            egg_size= c("large","small",NA),
            intelligent=c(0,0,1)
            )
lr <- glm(intelligent ~ mass_kg + egg_size, data=a, family = binomial)
summary(lr) 

# By default, egg_size is converted to a factor with no level for NA
# However, in this case NA is meaningful (since most mammals don't lay eggs)

a[,egg_size:=factor(egg_size, exclude=NULL) ] # exclude=NULL allows an NA level

lr <- glm(intelligent ~ mass_kg + egg_size, data=a, family = binomial)
summary(lr) # Now NA is included in the model, but not as the reference level

a[,levels(egg_size)] # Returns: [1] "large" "small" NA    

a[,egg_size:=relevel(egg_size,ref=NA)]
# Returns:
# Error in relevel.factor(egg_size, ref = NA) : 
#   'ref' must be an existing level

What is the correct syntax for relevel(), or do I need to use something else? Thanks much.

Upvotes: 2

Views: 2242

Answers (1)

eddi
eddi

Reputation: 49448

You have to specify the correct NA type, which is NA_character_, but that then throws out the NA, which is presumably a bug. A workaround is to specify levels directly yourself:

# throw out NA's to begin with
egg_size = factor(c("large","small",NA), exclude = NA)

# but then add them back at the beginning
factor(egg_size, c(NA, levels(egg_size)), exclude = NULL)
#[1] large small <NA> 
#Levels: <NA> large small

In case you're wondering, c converts the NA to the correct type, from logical.

Upvotes: 3

Related Questions