Reputation: 7147
I am trying to map multiple models over some data and store the results in something similar to a nested tibble or multiple lists. I would like to apply the models in the same pipe. I run the following:
data(iris)
df <- iris %>%
filter(Species != "setosa") %>%
mutate(Species = +(Species == "virginica"))
var_combos <- expand.grid(colnames(df[,1:4]), colnames(df[,1:4])) %>%
filter(!Var1 == Var2)
map2(
.x = var_combos$Var1,
.y = var_combos$Var2,
~select(df, .x, .y) %>%
mutate(
Species = df$Species
)
) %>%
map(., ~glm(Species ~ ., data = ., family = binomial(link='logit')))
Which gets me a nice maps logistic model. How can I store this model in a nested tibble or a list and then mutate
and add more models to be stored next to it, such as:
... %>%
map(., ~glm(Species ~ ., data = ., family = binomial(link='logit'))) %>%
map(., e1071::svm(Species ~ ., data = ., kernel = "polynomial"))
Upvotes: 1
Views: 173
Reputation: 887951
After looping over elements of columns of 'var_combos' with map2
, nest
the 'data' by creating a dummy column, map
over the 'data' and then create list
of models as new column
library(purrr)
library(dplyr)
out1 <- map2(
var_combos$Var1,
var_combos$Var2, ~
df %>%
select(Species, .x, .y) %>%
group_by(grp = 'grp') %>%
nest %>%
mutate(models = map(data, ~ {
list(glm(Species ~ ., data = .x, family = binomial(link='logit')),
e1071::svm(Species ~ ., data = .x, kernel = "polynomial") )
})))
out1[1:3]
#[[1]]
# A tibble: 1 x 3
# Groups: grp [1]
# grp data models
# <chr> <list> <list>
#1 grp <tibble [100 × 3]> <list [2]>
#[[2]]
# A tibble: 1 x 3
# Groups: grp [1]
# grp data models
# <chr> <list> <list>
#1 grp <tibble [100 × 3]> <list [2]>
#[[3]]
# A tibble: 1 x 3
# Groups: grp [1]
# grp data models
# <chr> <list> <list>
#1 grp <tibble [100 × 3]> <list [2]>
Checking the 'models'
out1[[1]]$models
#[[1]]
#[[1]][[1]]
#Call: glm(formula = Species ~ ., family = binomial(link = "logit"),
data = .x)
#Coefficients:
# (Intercept) Sepal.Width Sepal.Length
# -13.0460 0.4047 1.9024
#Degrees of Freedom: 99 Total (i.e. Null); 97 Residual
#Null Deviance: 138.6
#Residual Deviance: 110.3 AIC: 116.3
#[[1]][[2]]
#Call:
#svm(formula = Species ~ ., data = .x, kernel = "polynomial")
#Parameters:
# SVM-Type: eps-regression
# SVM-Kernel: polynomial
# cost: 1
# degree: 3
# gamma: 0.5
# coef.0: 0
# epsilon: 0.1
#Number of Support Vectors: 98
The reason for nesting is to avoid storing the models repeating each row of the 'data' unnecessarily with mutate
. Here, the data
is a list
and at any point we can unnest
to make it 'long' format
library(tidyr)
out1 %>%
map(~ .x %>%
unnest(c(data)))
Now, will see the 'model' list
gets repeated for each row. So, it would be better to store in a list
column or even extract the 'models' as a separate dataset
If we wanted to flatten
the 'models'
map(out1, ~ .x %>%
mutate(models = list(flatten(models)) ) %>%
unnest(c(models)))
Upvotes: 1