ra_learns
ra_learns

Reputation: 113

Using regression model to generate predicted values with broom package

I have a dataframe of student attributes and test scores, and I've created a linear model for each grade level (1 through 12). I am using the broom package to efficiently create a model for each grade level. Below is a simplified example dataset and the code I am using.

#start df creation 

school_year <- rep(2017:2020, 120)
grade <- rep(1:12, each = 40)
attendance_rate <- round(runif(480, min=25, max=100), 1)
test_growth <- round(runif(480, min = -12, max = 38))
binary_flag <- round(runif(480, min = 0, max = 1))
score <- round(runif(480, min = 92, max = 370))
survey_response <- round(runif(480, min = 1, max = 4))

df <- data.frame(school_year, grade, attendance_rate, test_growth, binary_flag, score, survey_response) 

df$survey_response[df$grade == 1] <- NA

# end df creation

df_train <- df %>% filter(!(school_year == 2020))
df_test <- df %>% filter(school_year == 2020)


#create models
model <- df_train %>%
  group_by(grade) %>% 
  nest() %>% 
  mutate(fit = map(data, ~ if(all(is.na(.x$survey_response)))
    lm(score ~ attendance_rate + test_growth + binary_flag, data = .x) 
    else lm(score ~ attendance_rate + test_growth + binary_flag + survey_response, data = .x)),
    tidied = map(fit, tidy),
    augmented = map(fit, augment),
    glanced = map(fit, glance))

Once I train the model, I want to use it to predict scores for the 2020 school year/ the test dataset. (The augment function in the code above generates a fitted value only for the observations in the training dataset.) Obviously, I need the 1st grade model created above to be applied only on the 1st grade data in the test set, 2nd grade model to be applied only to the 2nd grade data in the test set, and so on. For this reason, I haven't been able to get the basic predict(fit, df_test) to work.

How do I do that? Any help would be greatly appreciated.

Upvotes: 1

Views: 373

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388817

You can nest the test data , join it with model by grade and predict.

library(tidyverse)
df_test %>%
   nest(test_data = -grade) %>%
   inner_join(model, by = 'grade') %>%
   mutate(result = map2(fit, test_data, predict))

Upvotes: 3

Related Questions