hnguyen
hnguyen

Reputation: 814

`group_by` and keep grouping levels as nested data frame's name

This question is related to and inspired by can't use emmeans inside map

I am doing several steps of data analysis with the following code. I want to keep my grouping factor's levels as the nested data frames' names and uses those names to identify each of the steps along the way, instead of using the default enumeration [[1]], [[2]], [[3]], etc. I don't understand the error I got. Please see how I can fix my code.

library(dplyr)
library(purrr)
library(emmeans)
data("warpbreaks")
wb_emm <-  warpbreaks %>%
  group_by(tension) %>% 
  setNames(unique(.x$tension)) %>%
  nest() %>%
  mutate(models=map(data,~glm(breaks~wool,data=.x))) %>%
  mutate(jt = map(models, ~emmeans::joint_tests(.x, data = .x$data))) %>%
  mutate(means=map(models,~emmeans::emmeans(.x,"wool",data=.x$data))) %>%
  mutate(p_cont = map(means, ~emmeans::contrast(.x, "pairwise",infer = c(T,T))))

Error in unique(.x$tension) : object '.x' not found

I originally did group_by(tension) %>% setNames(unique(tension)) and got Error in unique(tension) : object 'tension' not found I also tried split(.$tension) but it is conflicted with nest()

But the tension levels are legible.

 unique(warpbreaks$tension)
[1] L M H
Levels: L M H

The code runs well without the setNames(unique(.x$tension)) %>% step.

wb_emm$p_cont
[[1]]
 contrast estimate   SE  df asymp.LCL asymp.UCL z.ratio p.value
 A - B        16.3 6.87 Inf      2.87      29.8 2.378   0.0174 

Confidence level used: 0.95 

[[2]]
 contrast estimate   SE  df asymp.LCL asymp.UCL z.ratio p.value
 A - B       -4.78 4.27 Inf     -13.1      3.59 -1.119  0.2630 

Confidence level used: 0.95 

[[3]]
 contrast estimate   SE  df asymp.LCL asymp.UCL z.ratio p.value
 A - B        5.78 3.79 Inf     -1.66      13.2 1.523   0.1277 

Confidence level used: 0.95 

Thank you.

Update: from the second solution provided by Ronak Shah below, I tried on diamonds but the names were unchanged. The code runs with either ungroup()%>% or ungroup%>%.

diamonds %>%
  group_by(cut) %>%
  nest() %>% 
  ungroup %>%
  mutate(models=map(data,~glm(price ~ x + y + z + clarity + color,data=.x)),
         jt = map(models, ~emmeans::joint_tests(.x, data = .x$data)),
         means=map(models,~emmeans::emmeans(.x,"color",data=.x$data)),
         p_cont = map(means, ~emmeans::contrast(.x, "pairwise",infer = c(T,T))),
         across(models:p_cont, stats::setNames,  .$cut)) -> diamond_result
> diamond_result$jt
[[1]]
 model term df1 df2 F.ratio p.value
 x            1 Inf 611.626 <.0001 
 y            1 Inf   2.914 0.0878 
 z            1 Inf 100.457 <.0001 
 clarity      7 Inf 800.852 <.0001 
 color        6 Inf 256.796 <.0001 

Upvotes: 0

Views: 514

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389325

You need to add setNames in the map step :

library(tidyverse)

warpbreaks %>%
  group_by(tension) %>% 
  nest() %>%
  ungroup %>%
  mutate(models=map(data,~glm(breaks~wool,data=.x)),
        jt = map(models, ~emmeans::joint_tests(.x, data = .x$data)),
        means=map(models,~emmeans::emmeans(.x,"wool",data=.x$data)),
        p_cont = setNames(map(means, 
                  ~emmeans::contrast(.x, "pairwise",infer = c(T,T))),.$tension))

If you want to name all the list output use across :

warpbreaks %>%
  group_by(tension) %>% 
  nest() %>%
  ungroup %>%
  mutate(models=map(data,~glm(breaks~wool,data=.x)),
         jt = map(models, ~emmeans::joint_tests(.x, data = .x$data)),
         means=map(models,~emmeans::emmeans(.x,"wool",data=.x$data)),
         p_cont = map(means, ~emmeans::contrast(.x, "pairwise",infer = c(T,T))),
         across(models:p_cont, setNames,  .$tension)) -> result

result$jt

#$L
# model term df1 df2 F.ratio p.value
# wool         1 Inf   5.653 0.0174 


#$M
# model term df1 df2 F.ratio p.value
# wool         1 Inf   1.253 0.2630 


#$H
# model term df1 df2 F.ratio p.value
# wool         1 Inf   2.321 0.1277 

Upvotes: 2

Related Questions