Reputation: 402
Following this question How to divide between groups of rows using dplyr?.
If I have this data frame:
id = c("a","a","b","b","c","c")
condition = c(0,1,0,1,0,1)
gene1 = sample(1:100,6)
gene2 = sample(1:100,6)
#...
geneN = sample(1:100,6)
df = data.frame(id,condition,gene1,gene2,geneN)
I want to group by id and divide the value of rows with condition == 0 with those with condition == 1 to get this :
df[condition == 0,3:5]/ df[condition == 1,3:5]
#
gene1 gene2 geneN
1 0.2187500 0.4946237 0.3750000
3 0.4700000 0.6382979 0.5444444
5 0.7674419 0.5471698 2.3750000
I can use dplyr as follows:
df %>%
group_by(id) %>%
summarise(gene1 = gene1[condition == 0] / gene1[condition == 1],
gene2 = gene2[condition == 0] / gene2[condition == 1],
geneN = geneN[condition == 0] / geneN[condition == 1])
But I have e.g. 100 variables such as below. How can I do that without having to list all the genes.
id = c("a","a","b","b","c","c")
condition = c(0,1,0,1,0,1)
genes = matrix(1:600,ncol = 100)
df = data.frame(id,condition,genes)
Upvotes: 1
Views: 473
Reputation: 47350
If your dataset is sorted and without irregularities you can do this using purr::map_dfr
:
df[paste0("gene",c(1,2,"N"))] %>% map_dfr(~.x[c(F,T)]/.x[c(T,F)])
# # A tibble: 3 x 3
# gene1 gene2 geneN
# <dbl> <dbl> <dbl>
# 1 0.1764706 1.323944 38.5000000
# 2 0.4895833 0.531250 0.3478261
# 3 0.3278689 2.705882 1.2424242
Or its base R equivalent:
as.data.frame(lapply(df[paste0("gene",c(1,2,"N"))],function(x) x[c(F,T)]/x[c(T,F)]))
you may need to bind the observations
, I skipped this step as it's not in your expected output.
Upvotes: 0
Reputation: 17668
You can try
df %>%
gather(k,v, -id, -condition) %>%
spread(condition, v) %>%
mutate(ratio=`0`/`1`) %>%
select(id, k, ratio) %>%
spread(k, ratio)
id gene1 gene2 geneN
1 a 0.3670886 0.5955056 1.192982
2 b 0.4767442 1.2222222 0.125000
3 c 18.2000000 2.0909091 6.000000
used your data with set.seed(123)
Upvotes: 1
Reputation: 39174
We can use summarise_at
to apply the same operation to many columns.
library(dplyr)
df2 <- df %>%
group_by(id) %>%
arrange(condition) %>%
summarise_at(vars(-condition), funs(first(.)/last(.))) %>%
ungroup()
df2
# # A tibble: 3 x 4
# id gene1 gene2 geneN
# <fct> <dbl> <dbl> <dbl>
# 1 a 0.524 2.28 0.654
# 2 b 1.65 0.616 1.38
# 3 c 0.578 2.00 2.17
Upvotes: 3