Using regression model to generate predicted values with broom package

Question

I have a dataframe of student attributes and test scores, and I've created a linear model for each grade level (1 through 12). I am using the broom package to efficiently create a model for each grade level. Below is a simplified example dataset and the code I am using.

#start df creation 

school_year <- rep(2017:2020, 120)
grade <- rep(1:12, each = 40)
attendance_rate <- round(runif(480, min=25, max=100), 1)
test_growth <- round(runif(480, min = -12, max = 38))
binary_flag <- round(runif(480, min = 0, max = 1))
score <- round(runif(480, min = 92, max = 370))
survey_response <- round(runif(480, min = 1, max = 4))

df <- data.frame(school_year, grade, attendance_rate, test_growth, binary_flag, score, survey_response) 

df$survey_response[df$grade == 1] <- NA

# end df creation

df_train <- df %>% filter(!(school_year == 2020))
df_test <- df %>% filter(school_year == 2020)


#create models
model <- df_train %>%
  group_by(grade) %>% 
  nest() %>% 
  mutate(fit = map(data, ~ if(all(is.na(.x$survey_response)))
    lm(score ~ attendance_rate + test_growth + binary_flag, data = .x) 
    else lm(score ~ attendance_rate + test_growth + binary_flag + survey_response, data = .x)),
    tidied = map(fit, tidy),
    augmented = map(fit, augment),
    glanced = map(fit, glance))

Once I train the model, I want to use it to predict scores for the 2020 school year/ the test dataset. (The augment function in the code above generates a fitted value only for the observations in the training dataset.) Obviously, I need the 1st grade model created above to be applied only on the 1st grade data in the test set, 2nd grade model to be applied only to the 2nd grade data in the test set, and so on. For this reason, I haven't been able to get the basic predict(fit, df_test) to work.

How do I do that? Any help would be greatly appreciated.

Ronak Shah · Accepted Answer

You can nest the test data , join it with model by grade and predict.

library(tidyverse)
df_test %>%
   nest(test_data = -grade) %>%
   inner_join(model, by = 'grade') %>%
   mutate(result = map2(fit, test_data, predict))

Using regression model to generate predicted values with broom package

Answers (1)

Related Questions