Reputation: 2983
I have a df
that needs to be grouped by multiple columns to subsequently calculate ratios for subset of different columns and the row-wise means and standard deviations.
grouper1 grouper2 condition value
foo baz A 1
foo baz B 2
foo oof A 1
foo oof C 3
bar zab B 2
bar zab C 4
Based on this elegant answer I have managed to built a generic solution:
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
crossing(c("A"), c("B","C")) %>%
pmap(~ query %>%
group_by(grouper1, grouper2) %>%
summarise(!! str_c('ratio_', ..1, ..2) :=
value[condition == ..1]/value[condition == ..2])) %>%
reduce(full_join, by = c('grouper1', 'grouper2')) %>%
ungroup() %>% mutate(mean=rowMeans(select(.,-(grouper1, grouper2)), SD=unlist(pmap(select(.,-(grouper1, grouper2)), ~sd(c(...)))))
This works well if all the values in condition
column are found in all groups. If this is not the case, e.g. A
is not present in the second grouping using grouper1
in the above example, I will receive the following error:
Error: Column ratio_AC must be length 1 (a summary value), not 0
I could obviously preselect the values for crossing
but this would require a filter on the df
and I will loose generality. I would thus like a solution that simply ignores the missing combinations and still calculates the metrics.
One possible solution would be pivot_wider
but here I cannot implement a working solution for calculation the ratios.
Upvotes: 1
Views: 177
Reputation: 886948
We could reshape to wide format with pivot_wider
and then use that dataset
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
df1 <- df %>%
pivot_wider(names_from = condition, values_from = value)
crossing(v1 = c("A"), v2 = c("B","C")) %>%
pmap(~ df1 %>%
transmute(grouper1, grouper2,
!! str_c('ratio_', ..1, ..2) :=
.[[..1]]/.[[..2]]))%>%
reduce(full_join, by = c('grouper1', 'grouper2')) %>%
mutate(mean = rowMeans(select(., -grouper1, -grouper2), na.rm = TRUE),
SD= pmap_dbl(select(., -grouper1, -grouper2),
~sd(c(...), na.rm = TRUE)))
df <- structure(list(grouper1 = c("foo", "foo", "foo", "foo", "bar",
"bar"), grouper2 = c("baz", "baz", "oof", "oof", "zab", "zab"
), condition = c("A", "B", "A", "C", "B", "C"), value = c(1L,
2L, 1L, 3L, 2L, 4L)), class = "data.frame", row.names = c(NA,
-6L))
Upvotes: 1