Reputation: 279
I have a dataset where I'm trying to loop over a factor variable (location
) and building a separate model for each level of that factor. Depending on the location
, however, there are single-level factor variables, which is giving me this error:
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
Called from: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
So depending on the location
, I want to drop any single-level factors from the model. I've tried splitting the data into one dataset that doesn't have any single-level factors and another that does, but I don't know how to drop a given factor variable depending on the location.
This will give you the error:
library(data.table)
dt <- data.table(df, key = "location")
lapply(unique(dt$location), function(z) lm(y ~ x1 + x2 + x3 + x4 + x5, data = dt[J(z),]))
I'm not very comfortable with data.table, however, so any non-data.table solution would be really helpful. Thank you.
Some data:
y <- rnorm(n = 100, mean = 50, 5)
x1 <- rnorm(n = 100, mean = 10, sd = 3)
location <- factor(c(rep(1, 20), rep(2, 20), rep(3, 20), rep(4, 20), rep(5, 20)))
x2 <- rnorm(n = 100, mean = 25, sd = 3)
x3 <- factor(sample(c(0, 1), size = 100, replace = TRUE))
x4 <- factor(ifelse(location == 1, 0,
ifelse(location == 2, sample(c(0, 1), size = 20, replace = TRUE),
ifelse(location == 3, 1,
ifelse(location == 4, 0, sample(c(0, 1), size = 20, replace = TRUE))))))
x5 <- factor(ifelse(location == 1, sample(c(0, 1), size = 20, replace = TRUE),
ifelse(location == 2, 1,
ifelse(location == 3, sample(c(0, 1), size = 20, replace = TRUE),
ifelse(location == 4, sample(c(0, 1), size = 20, replace = TRUE), 0)))))
df <- data.frame(y, location, x1, x2, x3, x4, x5)
Upvotes: 0
Views: 170
Reputation: 2096
Your model's formula is conditional on whether or not there are enough levels in each independent variable to be included.
You can create a formula based on these conditions (e.g., using ifelse()
) and then feed the formula to the model inside lapply()
.
Here is a solution:
lapply(unique(df$location), function(z) {
sub_df = dplyr::filter(df, location == z) # subset by location
form_x4 = ifelse(length(unique(sub_df$x4)) > 1, "+ x4", "")
form_x5 = ifelse(length(unique(sub_df$x5)) > 1, "+ x5", "")
form = as.formula(paste("y ~ x1 + x2 + x3", form_x4, form_x5))
return(lm(data = sub_df, formula = form))
})
The form
inside the above lapply(...)
combines the consistent part of the lm()
formula with multiple variables that meet the conditions to be used in the formula. If a variable only has a single level, the ifelse()
statement allows you to treat it as if it's not there when putting it in the formula.
Upvotes: 1