Mario Niepel
Mario Niepel

Reputation: 1165

propagating controls for a number of nested groups

This is a follow-up question to a previous post here.

I got a good set of answers from @akrun for my toy problem, but when going through the answer I realized that it is really not yet applicable to the real-life problem. This illustration of the problem is still correct: :

Before:

enter image description here

After:

enter image description here

The further challenge is that the 'grp' and 'treatment' variables are just the levels I am describing because these are where the controls need to be propagated. In fact there are four additional grouping variables at the same level as the groups, each one with their own sets of positive and negative controls.

So the solution to this problem can't be to identify all negative/positive controls and append them to each of the groups. The solution needs to be performed with the grouping taken into account and then propagated appropriately across all groups. Since dplyr seems to be very well suited for this type of approach, I am thinking that's the way to go, but I am kinda stuck in the middle. To me this suggests a purrr solution, but other than reduce() I have not worked with purrr at all. Or is maybe a group_by() %>% nest %>% ... %>% unnest the way to go?

library(tidyverse)

data_propagated_controls <- data %>%
# \\ group the data by all the grouping variables
# \\ exclude the 'treatment' variables
group_by(var1, var2, var3) %>%
# \\ split into individual dataframes
group_split() %>%
# \\ for each list item propagate controls
# \\ similar as the problem described below
# \\ steps to run in pseudocode

identify and extract controls
append controls to each treatment
add a column to distinguish treatment/controls
join all treatments/controls by rbind

# \\ reassemble the dataframe from the list
# \\ reduce with rbind or full_join should work
reduce(split, rbind)

Illustration of the problem from previous post:

librar(ggplot)

before <- structure(list(group = c("grp1", "grp1", "grp1", "grp1", 
"grp2", "grp2", "grp2", "grp2", "grp3", "grp3", "grp3", "grp3", 
"neg", "neg", "pos", "pos"), treatment = c("A", "B", "C", 
"D", "A", "B", "C", "D", "A", "B", "C", "D", "none", "none", 
"none", "none"), value = c(3L, 5L, 7L, 9L, 2L, 4L, 6L, 8L, 3L, 
4L, 6L, 9L, 12L, 10L, 1L, 2L)), class = "data.frame", row.names = c(NA, -16L))

ggplot(data = before, aes(x=treatment, y=value)) + geom_boxplot() + facet_wrap (~group)

after <- structure(list(group = c("grp1", "grp1", "grp1", "grp1", "grp1", "grp1", 
"grp1", "grp1", "grp2", "grp2", "grp2", "grp2", "grp2", "grp2", 
"grp2", "grp2", "grp3", "grp3", "grp3", "grp3", "grp3", "grp3", 
"grp3", "grp3"), treatment = c("A", "B", "C", "D", "neg", "neg", 
"pos", "pos", "A", "B", "C", "D", "neg", "neg", "pos", "pos", 
"A", "B", "C", "D", "neg", "neg", "pos", "pos"), value = c(3L, 
5L, 7L, 9L, 12L, 10L, 1L, 2L, 2L, 4L, 6L, 8L, 12L, 10L, 1L, 2L, 
3L, 4L, 6L, 9L, 12L, 10L, 1L, 2L)), class = "data.frame", row.names = c(NA, -24L))

ggplot(data = after, aes(x=treatment, y=value)) + geom_boxplot() + facet_wrap (~group)

Upvotes: 0

Views: 40

Answers (1)

Mario Niepel
Mario Niepel

Reputation: 1165

I think I sorted out a solution. It makes use of nested dataframes and the neat function of 'full_join' in that it propagates missing values to the appropriate missing spots.

For my code, Drug_ID == "DMSO" denotes the control treatment that is supposed to be propagated across all other Drug_IDs. And the columns Cell_Line_ID, DOX_ID, and time hold the additional grouping variables, each of which have their own respective control values for each individual condition.

Now this nicely allows me to plot the controls into each facet of the plot at each time point and condition. My last issue now is to get more control about the control value. If it overlaps with a bunch of other measures it's really hard to see. ggplot needs a function of 'bring_to_front' for specific elements.

enter image description here

#// generate a list of lists that contains all relevant controls only
temp_ctrls <- data %>%
     #// group by variables with separate DMSO controls
     group_by(Cell_Line_ID, DOX_ID, time) %>%
     #// identify and filter all controls
     filter(Drug_control != "none") %>%
     #// remove column for Drug_ID
     select(-Drug_ID) %>%
     #// split groups into individual lists
     nest()
#// change names of data column to dmso
names(temp_ctrls)[[which(names(temp_ctrls)=="data")]] <- "ctrl"

#// generate a list of lists that contains all data that needs appending
data_list <- data %>%
     #// group by variables now including Drug_ID
     group_by(Cell_Line_ID, DOX_ID, time, Drug_ID) %>%
     #// split groups into individual lists
     nest()

#// merge two nested lists by Cell_Line, DOX, and time
data_m <- full_join(data_list, temp_ctrls)

#// remove all list items with Drug_ID DMOS
data_m <- filter(data_m, Drug_ID != "DMSO")

#// assemble control and data and unnest
data_m <- data_m %>%
     #// create new list column with merged data + ctrl
     mutate(merged = map2(data, ctrl, rbind)) %>%
     #// remove extraneous data columns
     select(-data, -ctrl) %>%
     #// unnest everything into a single dataframe
     unnest()

#// clean-up
rm(temp_ctrls, data_list)

Upvotes: 1

Related Questions