Geet
Geet

Reputation: 2575

How to get regression output in R ignoring one factor level in data?

I am running a regression on the nested data with factor variables. If one grouped data has one factor level, the regression fails and throws the error "contrasts can be applied only to factors with 2 or more levels". For eg:

data <- mtcars %>% mutate(am = if_else(carb==1, 1,am),
                          am=as.factor(am))

data_carb <- data %>%
  group_by(carb) %>% 
  nest()

X <- c("cyl", "disp", "hp" , "drat", "wt", "qsec", "vs", "am", "gear")
Y <- "mpg"

generic_model <- function(df) {
  lm(reformulate(X, Y), data = df)
}

modelondata <-  data_carb %>% 
  mutate(model = data %>% map(generic_model),
         coeff  = model %>% map(broom::tidy)) %>% 
  unnest(coeff, .drop = TRUE)

How can I keep the variable as factor and get the output for atleast those grouped data for which the factor levels are more than 1 i.e for carb!=1?

In my real data, I have many factor variables with dozens of levels and the regression fails even if one of the grouped data has constant factor level. So, I don't want to drop the variables as I would lose insights into the other grouped data as well.

Upvotes: 2

Views: 516

Answers (1)

MrFlick
MrFlick

Reputation: 206616

What if you created a function to drop columns with "fixed" factors

drop_fixed_factors <- function(x) {
  x %>% keep(~!is.factor(.x) | length(unique(.x))>2)
}

Then did something like this

generic_model <- function(df) {
  good_data <- df[X] %>% drop_fixed_factors()
  lm(reformulate(names(good_data), Y), data = df)
}

Then you can keep only the columns that have variability.

Upvotes: 2

Related Questions