user113156
user113156

Reputation: 7147

map and store results in nested tibbles

I am trying to map multiple models over some data and store the results in something similar to a nested tibble or multiple lists. I would like to apply the models in the same pipe. I run the following:

data(iris)
df <- iris %>% 
  filter(Species != "setosa") %>% 
  mutate(Species = +(Species == "virginica"))

var_combos <- expand.grid(colnames(df[,1:4]), colnames(df[,1:4])) %>% 
  filter(!Var1 == Var2)

map2(
  .x = var_combos$Var1,
  .y = var_combos$Var2,
  ~select(df, .x, .y) %>% 
    mutate(
      Species = df$Species
    )
) %>%
  map(., ~glm(Species ~ ., data = ., family = binomial(link='logit')))

Which gets me a nice maps logistic model. How can I store this model in a nested tibble or a list and then mutate and add more models to be stored next to it, such as:

 ...   %>%
      map(., ~glm(Species ~ ., data = ., family = binomial(link='logit'))) %>% 
      map(., e1071::svm(Species ~ ., data = ., kernel = "polynomial"))

Upvotes: 1

Views: 173

Answers (1)

akrun
akrun

Reputation: 887951

After looping over elements of columns of 'var_combos' with map2, nest the 'data' by creating a dummy column, map over the 'data' and then create list of models as new column

library(purrr)
library(dplyr)    
out1 <- map2(
     var_combos$Var1,
     var_combos$Var2, ~  
       df %>%
           select(Species, .x, .y) %>%
           group_by(grp = 'grp') %>% 
           nest %>%
           mutate(models = map(data, ~ { 
           list(glm(Species ~ ., data = .x, family = binomial(link='logit')),
                e1071::svm(Species ~ ., data = .x, kernel = "polynomial") )
      })))
out1[1:3]
#[[1]]
# A tibble: 1 x 3
# Groups:   grp [1]
#  grp   data               models    
#  <chr> <list>             <list>    
#1 grp   <tibble [100 × 3]> <list [2]>

#[[2]]
# A tibble: 1 x 3
# Groups:   grp [1]
#  grp   data               models    
#  <chr> <list>             <list>    
#1 grp   <tibble [100 × 3]> <list [2]>

#[[3]]
# A tibble: 1 x 3
# Groups:   grp [1]
#  grp   data               models    
#  <chr> <list>             <list>    
#1 grp   <tibble [100 × 3]> <list [2]>

Checking the 'models'

out1[[1]]$models
#[[1]]
#[[1]][[1]]

#Call:  glm(formula = Species ~ ., family = binomial(link = "logit"), 
    data = .x)

#Coefficients:
# (Intercept)   Sepal.Width  Sepal.Length  
#    -13.0460        0.4047        1.9024  

#Degrees of Freedom: 99 Total (i.e. Null);  97 Residual
#Null Deviance:     138.6 
#Residual Deviance: 110.3   AIC: 116.3

#[[1]][[2]]

#Call:
#svm(formula = Species ~ ., data = .x, kernel = "polynomial")


#Parameters:
#   SVM-Type:  eps-regression 
# SVM-Kernel:  polynomial 
#       cost:  1 
#     degree:  3 
#      gamma:  0.5 
#     coef.0:  0 
#    epsilon:  0.1 


#Number of Support Vectors:  98

The reason for nesting is to avoid storing the models repeating each row of the 'data' unnecessarily with mutate. Here, the data is a list and at any point we can unnest to make it 'long' format

 library(tidyr)
 out1 %>% 
       map(~ .x %>% 
              unnest(c(data)))

Now, will see the 'model' list gets repeated for each row. So, it would be better to store in a list column or even extract the 'models' as a separate dataset

Update

If we wanted to flatten the 'models'

map(out1, ~ .x %>% 
        mutate(models = list(flatten(models)) ) %>%
        unnest(c(models)))

Upvotes: 1

Related Questions