Reputation: 39585
I am working in a modeling problem where I have to estimate multiple models per group using specific variables. After having all models per groups I need to compute the estimated values (fitted values) and standard errors (se) for all of them. I found we can use broom
and dplyr
for this. So, I sketched next code using iris
data:
library(dplyr)
library(broom)
#Data
data("iris")
#Code
iris2 <- iris %>% group_by(Species)
Data iris2
has a group based on Species
. With this I compute the different models using next code:
#Models
models <- iris2 %>%
do(
model1 = lm(Sepal.Length~Sepal.Width, data = .),
model2 = lm(Sepal.Width~Petal.Width, data = .),
model3 = lm(Petal.Width~Sepal.Length+Sepal.Width, data = .),
model4 = lm(Petal.Width~Petal.Length+Sepal.Length, data = .))
Which produces:
models
# A tibble: 3 x 5
# Rowwise:
Species model1 model2 model3 model4
<fct> <list> <list> <list> <list>
1 setosa <lm> <lm> <lm> <lm>
2 versicolor <lm> <lm> <lm> <lm>
3 virginica <lm> <lm> <lm> <lm>
All is fine. Now, I need to compute the predictions for the entire dataset iris2
using the four models. I used a merge approach to add the models to the iris2
dataframe:
#Merge with original data
Merged <- iris2 %>%
left_join(models)
At this point, the data has all the models but I am not sure of how to proceed to compute the fitted values and standard error. In the end I would like to have two additional columns per each model like predict.mod1
and se.mod1
until predict.mod4
and se.mod4
for all the observations in iris2
.
This is my main issue as I do not know how to use predict()
to take each model1
, model2
, model3
and model4
to compute both fitted and se values. I know I can use predict()
and predict(model,data,se=T)$$se.fit
to obtain the values in an isolated dataframe with one model but in this case I have multiple models after the merge action.
I have checked some posts around the site and they use mutate(result = map2(fit, data, predict))
to create the estimations. I tried a similar approach but did not work.
Many thanks for your help.
Upvotes: 1
Views: 298
Reputation: 886938
Instead of separating the dataset and joining, could create the models in the same data after nesting
library(dplyr)
library(broom)
iris1 <- iris %>%
nest_by(Species) %>%
mutate(model1 = list(lm(Sepal.Length ~ Sepal.Width, data = data)),
model2 = list(lm(Sepal.Width~Petal.Width, data = data)),
model3 = list(lm(Petal.Width~Sepal.Length+Sepal.Width, data = data)),
model4 = list(lm(Petal.Width~Petal.Length+Sepal.Length, data = data)))
-output
iris1
# A tibble: 3 x 6
# Rowwise: Species
# Species data model1 model2 model3 model4
# <fct> <list<tibble>> <list> <list> <list> <list>
#1 setosa [50 × 4] <lm> <lm> <lm> <lm>
#2 versicolor [50 × 4] <lm> <lm> <lm> <lm>
#3 virginica [50 × 4] <lm> <lm> <lm> <lm>
Create the columns in a similar pipeline as used by @LMc except that the 'data' can be passed in (which is not really needed here unless there is a different data)
iris1 %>%
mutate(across(starts_with('model'),
list(se = ~ list(tidy(.)$std.error),
predict = ~ list(predict(., data)))))
# A tibble: 3 x 14
# Rowwise: Species
# Species data model1 model2 model3 model4 model1_se model1_predict model2_se model2_predict model3_se model3_predict model4_se
# <fct> <list<tibble>> <list> <list> <list> <list> <list> <list> <list> <list> <list> <list> <list>
#1 setosa [50 × 4] <lm> <lm> <lm> <lm> <dbl [2]> <dbl [50]> <dbl [2]> <dbl [50]> <dbl [3]> <dbl [50]> <dbl [3]>
#2 versicolor [50 × 4] <lm> <lm> <lm> <lm> <dbl [2]> <dbl [50]> <dbl [2]> <dbl [50]> <dbl [3]> <dbl [50]> <dbl [3]>
#3 virginica [50 × 4] <lm> <lm> <lm> <lm> <dbl [2]> <dbl [50]> <dbl [2]> <dbl [50]> <dbl [3]> <dbl [50]> <dbl [3]>
# … with 1 more variable: model4_predict <list>
From the nested data, it can be unest
ed. Here, we are select
ing columns that have the same length (model
columns are different structure)
library(tidyr)
iris1 %>%
mutate(across(starts_with('model'),
list(se = ~ list(tidy(.)$std.error),
predict = ~ list(predict(., data))))) %>%
ungroup %>%
select(Species, data, ends_with('predict')) %>%
unnest(-Species)
# A tibble: 150 x 9
# Species Sepal.Length Sepal.Width Petal.Length Petal.Width model1_predict model2_predict model3_predict model4_predict
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 setosa 5.1 3.5 1.4 0.2 5.06 3.39 0.254 0.241
# 2 setosa 4.9 3 1.4 0.2 4.71 3.39 0.232 0.229
# 3 setosa 4.7 3.2 1.3 0.2 4.85 3.39 0.221 0.200
# 4 setosa 4.6 3.1 1.5 0.2 4.78 3.39 0.212 0.228
# 5 setosa 5 3.6 1.4 0.2 5.12 3.39 0.248 0.235
# 6 setosa 5.4 3.9 1.7 0.4 5.33 3.56 0.281 0.310
# 7 setosa 4.6 3.4 1.4 0.3 4.99 3.47 0.217 0.211
# 8 setosa 5 3.4 1.5 0.2 4.99 3.39 0.245 0.252
# 9 setosa 4.4 2.9 1.4 0.2 4.64 3.39 0.195 0.199
#10 setosa 4.9 3.1 1.5 0.1 4.78 3.31 0.233 0.246
# … with 140 more rows
Or using the se
from predict
iris1 %>%
mutate(across(starts_with('model'),
list(se = ~ list(predict(., data, se = TRUE)$se.fit),
predict = ~ list(predict(., data))))) %>%
ungroup %>%
select(-matches('^model\\d+$')) %>%
unnest(-Species)
Upvotes: 2