PugFanatic
PugFanatic

Reputation: 31

Regression model function (with user selected variables) on subset of data frame

Using data from the fivethirtyeight package...

library(fivethirtyeight)
grads <- college_recent_grads

Created a subset of the grads data to include desired variables

data <- grads[, c("men", "major_category", "employed", 
"employed_fulltime_yearround", "p25th", 
"p75th", "total")]

Then, I split the data subset up by major category and omitted the one NA value in the data

majorcats <- split(data, data$major_category)
names(majorcats)
majorcats <- majorcats %>% na.omit()

And tried to run a regression model in a function called facts, where the user could specify x, y, and z, z being a major category (hence why I split up the data subset by major_category)

facts <- function(x, y, z){
   category <- majorcats[["z"]]
   summary(lm(y ~ x, data = category))
 }

Unfortunately, when I try to input variables into facts (that are part of the majorcats data set, such as

facts(men, p25th, Arts)

I get the error below:

Error in model.frame.default(formula = y ~ x, data = category, 
drop.unused.levels = TRUE) : 
  invalid type (NULL) for variable 'y'
Called from: model.frame.default(formula = y ~ x, data = category, 
drop.unused.levels = TRUE)
Browse[1]> 

Can someone please explain what this error means, and how I might be able to fix it?

Upvotes: 0

Views: 84

Answers (1)

Parfait
Parfait

Reputation: 107567

Simply pass the parameters as string literals and create a formula from string:

facts <- function(x, y, z){
   category <- majorcats[[z]]

   model <- as.formula(paste(y, "~", x))
   # ALTERNATIVE: model <- reformulate(x, response=y)
   summary(lm(model, data = category))
 }

facts("men", "p25th", "Arts")

Upvotes: 1

Related Questions