Fourier
Fourier

Reputation: 2983

Multiple ratios by column-wise division with dplyr following grouping

I have a df that needs to be grouped by multiple columns to subsequently calculate ratios for subset of different columns and the row-wise means and standard deviations.

grouper1 grouper2 condition value
foo      baz      A         1
foo      baz      B         2
foo      oof      A         1
foo      oof      C         3
bar      zab      B         2
bar      zab      C         4

Based on this elegant answer I have managed to built a generic solution:

library(dplyr)
library(tidyr)
library(purrr)
library(stringr)


crossing(c("A"), c("B","C")) %>%
  pmap(~ query %>%
         group_by(grouper1, grouper2) %>%
            summarise(!! str_c('ratio_', ..1, ..2) :=
                 value[condition == ..1]/value[condition == ..2])) %>% 
            reduce(full_join, by = c('grouper1', 'grouper2')) %>% 
  ungroup() %>% mutate(mean=rowMeans(select(.,-(grouper1, grouper2)), SD=unlist(pmap(select(.,-(grouper1, grouper2)), ~sd(c(...)))))

This works well if all the values in condition column are found in all groups. If this is not the case, e.g. A is not present in the second grouping using grouper1 in the above example, I will receive the following error:

Error: Column ratio_AC must be length 1 (a summary value), not 0

I could obviously preselect the values for crossing but this would require a filter on the df and I will loose generality. I would thus like a solution that simply ignores the missing combinations and still calculates the metrics.

One possible solution would be pivot_wider but here I cannot implement a working solution for calculation the ratios.

Upvotes: 1

Views: 177

Answers (1)

akrun
akrun

Reputation: 886948

We could reshape to wide format with pivot_wider and then use that dataset

library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
df1 <- df %>% 
          pivot_wider(names_from = condition, values_from = value)


crossing(v1 = c("A"), v2 = c("B","C")) %>%
  pmap(~ df1 %>%           
           transmute(grouper1, grouper2, 
          !! str_c('ratio_', ..1, ..2) :=
                 .[[..1]]/.[[..2]]))%>% 
            reduce(full_join, by = c('grouper1', 'grouper2'))  %>%
   mutate(mean = rowMeans(select(., -grouper1, -grouper2), na.rm = TRUE), 
          SD=   pmap_dbl(select(., -grouper1, -grouper2), 
              ~sd(c(...), na.rm = TRUE)))

data

df <- structure(list(grouper1 = c("foo", "foo", "foo", "foo", "bar", 
"bar"), grouper2 = c("baz", "baz", "oof", "oof", "zab", "zab"
), condition = c("A", "B", "A", "C", "B", "C"), value = c(1L, 
2L, 1L, 3L, 2L, 4L)), class = "data.frame", row.names = c(NA, 
-6L))

Upvotes: 1

Related Questions