Reputation: 2575
I am running a regression on the nested data with factor variables. If one grouped data has one factor level, the regression fails and throws the error "contrasts can be applied only to factors with 2 or more levels". For eg:
data <- mtcars %>% mutate(am = if_else(carb==1, 1,am),
am=as.factor(am))
data_carb <- data %>%
group_by(carb) %>%
nest()
X <- c("cyl", "disp", "hp" , "drat", "wt", "qsec", "vs", "am", "gear")
Y <- "mpg"
generic_model <- function(df) {
lm(reformulate(X, Y), data = df)
}
modelondata <- data_carb %>%
mutate(model = data %>% map(generic_model),
coeff = model %>% map(broom::tidy)) %>%
unnest(coeff, .drop = TRUE)
How can I keep the variable as factor and get the output for atleast those grouped data for which the factor levels are more than 1 i.e for carb!=1?
In my real data, I have many factor variables with dozens of levels and the regression fails even if one of the grouped data has constant factor level. So, I don't want to drop the variables as I would lose insights into the other grouped data as well.
Upvotes: 2
Views: 516
Reputation: 206616
What if you created a function to drop columns with "fixed" factors
drop_fixed_factors <- function(x) {
x %>% keep(~!is.factor(.x) | length(unique(.x))>2)
}
Then did something like this
generic_model <- function(df) {
good_data <- df[X] %>% drop_fixed_factors()
lm(reformulate(names(good_data), Y), data = df)
}
Then you can keep only the columns that have variability.
Upvotes: 2