tassones
tassones

Reputation: 1692

Efficiently extract fitted values from linear regression with many groups

How can I efficiently extract the fitted values from several linear regression models and append them to the original data used to build the models?

Example Data:

library(dplyr)

# Fit several (3 in this case) linear regression models 

fitted_models <- iris %>%
  group_by(Species) %>%
  do(model = lm(Petal.Length~Sepal.Length+Sepal.Width, data = .))

I can extract the fitted values for each group (see below) but this is cumbersome and would be inefficient if you have 10's or 100's of models. How can I more efficiently extract the fitted data from the models and append them back to the dataset used to build the models?

df2 <- iris[,c(5,3)]
df2$predicted <- NA
df2[1:50,3] <- fitted_models$model[[1]]$fitted.values
df2[51:100,3] <- fitted_models$model[[2]]$fitted.values 
df2[101:150,3] <- fitted_models$model[[3]]$fitted.values 
df2

Upvotes: 2

Views: 1055

Answers (2)

user10917479
user10917479

Reputation:

Getting used to nested data frames can be helpful for things like this. Here is one approach for your entire problem.

You can find more examples here:

https://cran.r-project.org/web/packages/broom/vignettes/broom_and_dplyr.html

library(dplyr)
library(tidyr)
library(purrr)

fitted_models <- iris %>%
  nest(data = -Species) %>% 
  mutate(fit = map(data, ~ lm(Petal.Length ~ Sepal.Length + Sepal.Width, data = .x)),
         fitted.values = map(fit, "fitted.values")) %>% 
  unnest(cols = c(data, fitted.values)) %>% 
  select(-fit)
> fitted_models
# A tibble: 150 x 6
   Species Sepal.Length Sepal.Width Petal.Length Petal.Width fitted.values
   <fct>          <dbl>       <dbl>        <dbl>       <dbl>         <dbl>
 1 setosa           5.1         3.5          1.4         0.2          1.47
 2 setosa           4.9         3            1.4         0.2          1.46
 3 setosa           4.7         3.2          1.3         0.2          1.42
 4 setosa           4.6         3.1          1.5         0.2          1.41
 5 setosa           5           3.6          1.4         0.2          1.46
 6 setosa           5.4         3.9          1.7         0.4          1.51
 7 setosa           4.6         3.4          1.4         0.3          1.40
 8 setosa           5           3.4          1.5         0.2          1.46
 9 setosa           4.4         2.9          1.4         0.2          1.38
10 setosa           4.9         3.1          1.5         0.1          1.45
# ... with 140 more rows

Upvotes: 2

akrun
akrun

Reputation: 887223

With the model created, there is rowwise grouping, so we can directly extract in a list and unnest the list column

library(dplyr)
library(tidyr)
fitted_models %>%
    transmute(Species, fitted.values = list(model$fitted.values)) %>% 
    ungroup %>%
    unnest(fitted.values)

-output

# A tibble: 150 × 2
   Species fitted.values
   <fct>           <dbl>
 1 setosa           1.47
 2 setosa           1.46
 3 setosa           1.42
 4 setosa           1.41
 5 setosa           1.46
 6 setosa           1.51
 7 setosa           1.40
 8 setosa           1.46
 9 setosa           1.38
10 setosa           1.45
# … with 140 more rows

Upvotes: 3

Related Questions