user12310746
user12310746

Reputation: 279

lm() looped over factor variable while dropping single-level factor variables from the model

I have a dataset where I'm trying to loop over a factor variable (location) and building a separate model for each level of that factor. Depending on the location, however, there are single-level factor variables, which is giving me this error:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels
Called from: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])

So depending on the location, I want to drop any single-level factors from the model. I've tried splitting the data into one dataset that doesn't have any single-level factors and another that does, but I don't know how to drop a given factor variable depending on the location.

This will give you the error:

library(data.table)

dt <- data.table(df, key = "location")
lapply(unique(dt$location), function(z) lm(y ~ x1 + x2 + x3 + x4 + x5, data = dt[J(z),]))

I'm not very comfortable with data.table, however, so any non-data.table solution would be really helpful. Thank you.

Some data:


y <- rnorm(n = 100, mean = 50, 5) 
x1 <- rnorm(n = 100, mean = 10, sd = 3)                                         
location <- factor(c(rep(1, 20), rep(2, 20), rep(3, 20), rep(4, 20), rep(5, 20)))
x2 <- rnorm(n = 100, mean = 25, sd = 3)
x3 <- factor(sample(c(0, 1), size = 100, replace = TRUE))
x4 <- factor(ifelse(location == 1, 0, 
                ifelse(location == 2, sample(c(0, 1), size = 20, replace = TRUE), 
                       ifelse(location == 3, 1, 
                              ifelse(location == 4, 0, sample(c(0, 1), size = 20, replace = TRUE))))))
x5 <- factor(ifelse(location == 1, sample(c(0, 1), size = 20, replace = TRUE),
                     ifelse(location == 2, 1, 
                            ifelse(location == 3, sample(c(0, 1), size = 20, replace = TRUE),  
                                   ifelse(location == 4, sample(c(0, 1), size = 20, replace = TRUE), 0)))))

df <- data.frame(y, location, x1, x2, x3, x4, x5) 

Upvotes: 0

Views: 170

Answers (1)

LC-datascientist
LC-datascientist

Reputation: 2096

Your model's formula is conditional on whether or not there are enough levels in each independent variable to be included.

You can create a formula based on these conditions (e.g., using ifelse()) and then feed the formula to the model inside lapply().

Here is a solution:

lapply(unique(df$location), function(z) {
    sub_df = dplyr::filter(df, location == z) # subset by location
    form_x4 = ifelse(length(unique(sub_df$x4)) > 1, "+ x4", "")
    form_x5 = ifelse(length(unique(sub_df$x5)) > 1, "+ x5", "")
    form = as.formula(paste("y ~ x1 + x2 + x3", form_x4, form_x5))
    return(lm(data = sub_df, formula = form))
})

The form inside the above lapply(...) combines the consistent part of the lm() formula with multiple variables that meet the conditions to be used in the formula. If a variable only has a single level, the ifelse() statement allows you to treat it as if it's not there when putting it in the formula.

Upvotes: 1

Related Questions