Using parameter-specific information in summarize()

Question

Given the following data frame:

mydf <- data.frame(
    Treatment = c('T1', 'T1', 'T1', 'T1', 'T1', 'T1', 'T2', 'T2', 'T2', 'T2', 'T2', 'T2'),
    Observation = c('pH', 'pH', 'pH', 'RS', 'RS', 'RS', 'pH', 'pH', 'pH', 'RS', 'RS', 'RS'),
    Value = c(3.13, 3.21, 3.26, 19.20, 19.50, 9.70, 3.13, 3.40, 3.31, 11.00, 18.10, 7.50)
)

I need to generate a data frame where the rows are treatments, the columns are observations, and the values are strings referencing the mean and standard deviations of the relevant values. Here is some code which builds such a data frame:

mydf %>% group_by(Treatment, Observation) %>% 
  summarise(MeanSD = sprintf("%0.2f $\pm$ %0.2f", mean(Value), sd(Value))) %>% 
  spread(Observation, MeanSD) %>% 
ungroup()

And here is the output of that code:

# A tibble: 2 x 3
  Treatment                 pH                  RS
*                                 
1        T1 "3.20 $\pm$ 0.07" "16.13 $\pm$ 5.57"
2        T2 "3.28 $\pm$ 0.14" "12.20 $\pm$ 5.40"

I have now been told that I need to set the significant figures for those strings based on the observations. For the sake of argument, let's assume the pH mean and SD sig figs should be 2 and 2, respectively, while the RS mean and SD sig figs should be 0 and 1, respectively.

fmtStr <- list('pH'="%0.2f $\pm$ %0.2f", 'RS'="%0.0f $\pm$ %0.1f")

I tried this:

mydf %>% group_by(Treatment, Observation) %>% 
  summarise(MeanSD = sprintf(fmtStr[[Observation]], mean(Value), sd(Value))) %>% 
  spread(Observation, MeanSD) %>% 
ungroup()

And that generated this error:

Error in summarise_impl(.data, dots) : 
  Evaluation error: recursive indexing failed at level 2
.

What's the right incantation to achieve my goal?

CJ Yetman · Accepted Answer

You get that error because you can't extract from a list like that...

fmtStr[[mydf$Observation]]
# Error in fmtStr[[mydf$Observation]] : 
#   recursive indexing failed at level 2

You can subset the list with fmtStr[mydf$Observation] and convert it to a character vector with unlist(), but that still won't work in your summarise() command because you'll have a string for each observation within the group rather than just one for the summary value...

mydf %>% 
  group_by(Treatment, Observation) %>% 
  summarise(MeanSD = sprintf(unlist(fmtStr[Observation]), mean(Value), sd(Value)))
# Error in summarise_impl(.data, dots) : 
#   Column `MeanSD` must be length 1 (a summary value), not 3

Since your data is grouped by Observation, you can assume that every value of Observation will be the same within a group, and therefore just use the first value...

mydf %>% 
  group_by(Treatment, Observation) %>% 
  summarise(MeanSD = sprintf(fmtStr[Observation][[1]], mean(Value), sd(Value)))
# # A tibble: 4 x 3
# # Groups:   Treatment [?]
#   Treatment Observation MeanSD            
#                            
# 1 T1        pH          "3.20 $\pm$ 0.07"
# 2 T1        RS          "16 $\pm$ 5.6"   
# 3 T2        pH          "3.28 $\pm$ 0.14"
# 4 T2        RS          "12 $\pm$ 5.4"

So your full code would look like...

mydf %>% 
  group_by(Treatment, Observation) %>% 
  summarise(MeanSD = sprintf(fmtStr[Observation][[1]], mean(Value), sd(Value))) %>% 
  spread(Observation, MeanSD) %>% 
  ungroup()
# # A tibble: 2 x 3
#   Treatment pH                 RS             
#                                
# 1 T1        "3.20 $\pm$ 0.07" "16 $\pm$ 5.6"
# 2 T2        "3.28 $\pm$ 0.14" "12 $\pm$ 5.4"

Using parameter-specific information in summarize()

Answers (1)

Related Questions