Reputation: 113
I have a dataframe of student attributes and test scores, and I am trying to fit a linear model for each grade level (1 through 12). I am using the broom package to efficiently create a model for each grade level. Below is a simplified example dataset and the code I am using.
#start df creation
grade <- rep(1:12, each = 40)
attendance_rate <- round(runif(480, min=25, max=100), 1)
test_growth <- round(runif(480, min = -12, max = 38))
binary_flag <- round(runif(480, min = 0, max = 1))
score <- round(runif(480, min = 92, max = 370))
survey_response <- round(runif(480, min = 1, max = 4))
df <- data.frame(grade, attendance_rate, test_growth, binary_flag, score, survey_response)
df$survey_response[df$grade == 1] <- NA
# end df creation
#create train test split for each grade level
set.seed(123)
df_train <- lapply(split(seq(1:nrow(df)), df$grade), function(x) sample(x, floor(.6*length(x))))
df_test <- mapply(function(x,y) setdiff(x,y), x = split(seq(1:nrow(df)), df$grade), y = df_train)
df_train <- df[unlist(df_train),]
df_test <- df[unlist(df_test),]
#create models
models_nested <- df_train %>%
group_by(grade) %>% nest() %>%
mutate(
fit = map(data, ~ lm(score ~ attendance_rate + test_growth + binary_flag + survey_response, data = .x)),
tidied = map(fit, tidy),
augmented = map(fit, augment),
glanced = map(fit, glance)
)
Unfortunately, when I try to run the code block that begins with models_nested, I receive the following error:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases
I know this is happening because all students in 1st grade have an NA value in the survey_response column. I do not know how to resolve this without running a separate regression for 1st grade that drops the survey response column/variable entirely. Is there a way to tell the lm function to simply ignore a variable if that particular grade subset only contains null values? I obviously want to keep that variable in the regression for the other grade level models.
I did my best to make this question clear, but I will be happy to clarify in the comments if necessary.
EDIT 6/9/2020: I don't want to return NA for the first grade model, I would just like the linear model for first grade to run without the survey_response column. I would like the survey_response column to be included in all the other grade level models.
I hope someone can help!
Upvotes: 1
Views: 301
Reputation: 389055
We can check for NA
values in survey_response
and use the model accordingly.
library(broom)
library(dplyr)
library(tidyr)
library(purrr)
df_train %>%
group_by(grade) %>%
nest() %>%
mutate(fit = map(data, ~ if(all(is.na(.x$survey_response)))
lm(score ~ attendance_rate + test_growth + binary_flag, data = .x)
else lm(score ~ attendance_rate + test_growth + binary_flag + survey_response, data = .x)),
tidied = map(fit, tidy),
augmented = map(fit, augment),
glanced = map(fit, glance))
# grade data fit tidied augmented glanced
# <int> <list> <list> <list> <list> <list>
# 1 1 <tibble [24 × 5]> <lm> <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
# 2 2 <tibble [24 × 5]> <lm> <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
# 3 3 <tibble [24 × 5]> <lm> <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
# 4 4 <tibble [24 × 5]> <lm> <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
# 5 5 <tibble [24 × 5]> <lm> <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
# 6 6 <tibble [24 × 5]> <lm> <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
# 7 7 <tibble [24 × 5]> <lm> <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
# 8 8 <tibble [24 × 5]> <lm> <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
# 9 9 <tibble [24 × 5]> <lm> <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
#10 10 <tibble [24 × 5]> <lm> <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
#11 11 <tibble [24 × 5]> <lm> <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
#12 12 <tibble [24 × 5]> <lm> <tibble [4 × 5]> <tibble [24 × 11]> <tibble [1 × 11]>
Upvotes: 1
Reputation: 887231
We can use possibly
from purrr
library(broom)
library(dplyr)
library(tidyr)
library(purrr)
poslm <- possibly(lm, otherwise = NA)
df_train %>%
group_by(grade) %>%
nest() %>%
mutate(fit = map(data, ~ poslm(score ~ attendance_rate + test_growth +
binary_flag + survey_response, data = .x)),
tidied = map(fit, possibly(tidy, otherwise = NA)),
augmented = map(fit, possibly(augment, otherwise = NA)),
glanced = map(fit, possibly(glance, otherwise = NA)))
# A tibble: 12 x 6
# Groups: grade [12]
# grade data fit tidied augmented glanced
# <int> <list> <list> <list> <list> <list>
# 1 1 <tibble [24 × 5]> <lgl [1]> <lgl [1]> <lgl [1]> <lgl [1]>
# 2 2 <tibble [24 × 5]> <lm> <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
# 3 3 <tibble [24 × 5]> <lm> <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
# 4 4 <tibble [24 × 5]> <lm> <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
# 5 5 <tibble [24 × 5]> <lm> <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
# 6 6 <tibble [24 × 5]> <lm> <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
# 7 7 <tibble [24 × 5]> <lm> <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
# 8 8 <tibble [24 × 5]> <lm> <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
# 9 9 <tibble [24 × 5]> <lm> <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
#10 10 <tibble [24 × 5]> <lm> <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
#11 11 <tibble [24 × 5]> <lm> <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
#12 12 <tibble [24 × 5]> <lm> <tibble [5 × 5]> <tibble [24 × 12]> <tibble [1 × 11]>
Upvotes: 0