Reputation: 113
I have a dataframe of student attributes and test scores, and I've created a linear model for each grade level (1 through 12). I am using the broom package to efficiently create a model for each grade level. Below is a simplified example dataset and the code I am using.
#start df creation
school_year <- rep(2017:2020, 120)
grade <- rep(1:12, each = 40)
attendance_rate <- round(runif(480, min=25, max=100), 1)
test_growth <- round(runif(480, min = -12, max = 38))
binary_flag <- round(runif(480, min = 0, max = 1))
score <- round(runif(480, min = 92, max = 370))
survey_response <- round(runif(480, min = 1, max = 4))
df <- data.frame(school_year, grade, attendance_rate, test_growth, binary_flag, score, survey_response)
df$survey_response[df$grade == 1] <- NA
# end df creation
df_train <- df %>% filter(!(school_year == 2020))
df_test <- df %>% filter(school_year == 2020)
#create models
model <- df_train %>%
group_by(grade) %>%
nest() %>%
mutate(fit = map(data, ~ if(all(is.na(.x$survey_response)))
lm(score ~ attendance_rate + test_growth + binary_flag, data = .x)
else lm(score ~ attendance_rate + test_growth + binary_flag + survey_response, data = .x)),
tidied = map(fit, tidy),
augmented = map(fit, augment),
glanced = map(fit, glance))
Once I train the model, I want to use it to predict scores for the 2020 school year/ the test dataset. (The augment function in the code above generates a fitted value only for the observations in the training dataset.) Obviously, I need the 1st grade model created above to be applied only on the 1st grade data in the test set, 2nd grade model to be applied only to the 2nd grade data in the test set, and so on. For this reason, I haven't been able to get the basic predict(fit, df_test)
to work.
How do I do that? Any help would be greatly appreciated.
Upvotes: 1
Views: 373
Reputation: 388817
You can nest
the test data , join it with model
by grade
and predict
.
library(tidyverse)
df_test %>%
nest(test_data = -grade) %>%
inner_join(model, by = 'grade') %>%
mutate(result = map2(fit, test_data, predict))
Upvotes: 3