Reputation: 79
Consider the following data
set.seed(123)
example.df <- data.frame(
gene = sample(c("A", "B", "C", "D"), 100, replace = TRUE),
treated = sample(c("Yes", "No"), 100, replace = TRUE),
resp=rnorm(100, 10,5), effect = rnorm (100, 25, 5))
I am trying to get the maximum value for all variables when they are compared by the levels of gene and grouped by treated. I can create the gene combinations like so,
combn(sort(unique(example.df$gene)), 2, simplify = T)
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] A A A B B c
#[2,] B c D c D D
#Levels: A B c D
Edit: The output I am looking for is a dataframe like this
comparison group max.resp max.effect
A-B no value1 value2
....
C-D no valueX valueY
A-B yes value3 value4
....
C-D yes valueXX valueYY
While I am able to get the max values for each individual gene level grouped by treated...
max.df <- example.df %>%
group_by(treated, gene) %>%
nest() %>%
mutate(mod = map(data, ~summarise_if(.x, is.numeric, max, na.rm = TRUE))) %>%
select(treated, gene, mod) %>%
unnest(mod) %>%
arrange(treated, gene)
Despite trying to tackle the issue for more than a day, I cannot figure out how to get the max for each numeric variable for each 2 level gene comparison (A vs B, A vs C, A vs D, B vs C, B vs D, and C vs D) grouped by treated.
Any help is appreciated. Thanks.
Upvotes: 0
Views: 81
Reputation: 4082
I found a solution, it might be a little messy, but I will update it in a better way, it takes no time whatsoever
library(tidyverse)
First I generate a dataframe with two columns, Gen1 and Gen2 for al possible comparisons, very similar to your use of combn
but creating a data.frame
GeneComp <- expand.grid(Gen1 = unique(example.df$gene), Gen2 = unique(example.df$gene)) %>% filter(Gen1 != Gen2) %>% arrange(Gen1)
Then I loop throught it grouping by
Comps <- list()
for(i in 1:nrow(GeneComp)){
Comps[[i]] <- example.df %>% filter(gene == GeneComp[i,]$Gen1 | gene == GeneComp[i,]$Gen2) %>% # This line filters only the data with genes in the ith row
group_by(treated) %>% # Then gorup by treated
summarise_if(is.numeric, max) %>% # then summarise max if numeric
mutate(Comparison = paste(GeneComp[i,]$Gen1, GeneComp[i,]$Gen2, sep = "-")) # and generate the comparisson variable
}
Comps <- bind_rows(Comps) # and finally join in a data frame
let me know if it does everything you want
It is important here that your genes are strings and not factors so you might have to do this
options(stringsAsFactors = FALSE)
example.df <- data.frame(
gene = c(sample(c("A", "B", "C", "D"), 100, replace = TRUE)),
treated = sample(c("Yes", "No"), 100, replace = TRUE),
resp=rnorm(100, 10,5), effect = rnorm (100, 25, 5))
Then again in expand.grid
add the stringsAsFactors = F
argument
GeneComp <- expand.grid(Gen1 = unique(example.df$gene), Gen2 = unique(example.df$gene), stringsAsFactors = F) %>% filter(Gen1 != Gen2) %>% arrange(Gen1)
Now that allows you in the loop when pasting the Comparisson variable to sort both inputs, with that, the lines will be duplicated, but when you use the distinct
function at the end, it will make your data the way you want it
Comps <- list()
for(i in 1:nrow(GeneComp)){
Comps[[i]] <- example.df %>% filter(gene == GeneComp[i,]$Gen1 | gene == GeneComp[i,]$Gen2) %>% # This line filters only the data with genes in the ith row
group_by(treated) %>% # Then gorup by treated
summarise_if(is.numeric, max) %>% # then summarise max if numeric
mutate(Comparison = paste(sort(c(GeneComp[i,]$Gen1, GeneComp[i,]$Gen2))[1], sort(c(GeneComp[i,]$Gen1, GeneComp[i,]$Gen2))[2], sep = "-")) # and generate the comparisson variable
}
Comps <- bind_rows(Comps) %>% distinct() # and finally join in a data frame
Upvotes: 1