Reputation: 1747
This is a subset of my large data:
gene feature reads
A anot 2
A 3ss_A 3
A 3ss_B 5
B 5ss_A 1
B anot 4
C 3ss_A 2
C 3ss_B 8
C anot 3
C 5ss_A 6
I want to divide reads corresponding to 3ss and 5ss features in each gene to feature "anot" of that gene. I have multiple features for each gene (not shown here) but each gene only has one "anot" feature.
expected output is:
gene feature reads ratio
A anot 2 1
A 3ss_A 3 1.5
A 3ss_B 5 2.5
B 5ss_A 1 0.25
B anot 4 1
C 3ss_A 2 0.666666667
C 3ss_B 8 2.666666667
C anot 3 1
C 5ss_A 6 2
How I could do this in R? Thanks
Upvotes: 2
Views: 2615
Reputation: 270268
Here are a variety of alternatives:
1) ave Use ave
like this. The function fun
is passed the vector of row numbers for one gene and returns the vector of ratios for it. No packages are used.
fun <- function(ix) with(DF[ix, ], reads / reads[feature == "anot"])
transform(DF, ratio = ave(1:nrow(DF), gene, FUN = fun))
giving:
gene feature reads ratio
1 A anot 2 1.0000000
2 A 3ss_A 3 1.5000000
3 A 3ss_B 5 2.5000000
4 B 5ss_A 1 0.2500000
5 B anot 4 1.0000000
6 C 3ss_A 2 0.6666667
7 C 3ss_B 8 2.6666667
8 C anot 3 1.0000000
9 C 5ss_A 6 2.0000000
1a) ave Here is another approach to using ave
. It replaces each non-anot reading with NA and then in each gene it divides the readings by the non-NA using na.omit
:
transform(DF, ratio =
reads / ave(ifelse(feature == "anot", reads, NA), gene, FUN = na.omit))
giving:
gene feature reads ratio
1 A anot 2 1.0000000
2 A 3ss_A 3 1.5000000
3 A 3ss_B 5 2.5000000
4 B 5ss_A 1 0.2500000
5 B anot 4 1.0000000
6 C 3ss_A 2 0.6666667
7 C 3ss_B 8 2.6666667
8 C anot 3 1.0000000
9 C 5ss_A 6 2.0000000
1b) ave Here is another ave
variation. This one is particularly concise but does assume that the reads
value of anot
is always non-negative (which is the case in the example in the question). It creates a vector equal to reads
for anot
and zero otherwise and then takes the maximum:
transform(DF, ratio = reads / ave((feature == "anot") * reads, gene, FUN = max))
giving:
gene feature reads ratio
1 A anot 2 1.0000000
2 A 3ss_A 3 1.5000000
3 A 3ss_B 5 2.5000000
4 B 5ss_A 1 0.2500000
5 B anot 4 1.0000000
6 C 3ss_A 2 0.6666667
7 C 3ss_B 8 2.6666667
8 C anot 3 1.0000000
9 C 5ss_A 6 2.0000000
2) by An alternative, also not using any packages, is to use by
. Here the function funby
takes a subset of rows of DF
and returns the subset with the ratio appended on.
funby <- function(x) transform(x, ratio = reads / reads[feature == "anot"])
do.call("rbind", by(DF, DF$gene, funby))
giving:
gene feature reads ratio
A.1 A anot 2 1.0000000
A.2 A 3ss_A 3 1.5000000
A.3 A 3ss_B 5 2.5000000
B.4 B 5ss_A 1 0.2500000
B.5 B anot 4 1.0000000
C.6 C 3ss_A 2 0.6666667
C.7 C 3ss_B 8 2.6666667
C.8 C anot 3 1.0000000
C.9 C 5ss_A 6 2.0000000
3) rep/table This also uses no packages. It assumes that DF
is sorted by gene (which is the case in the example in the question). It repeats each anot
reading for the number of rows in that gene and then divides reads
by that.
transform(DF, ratio = reads / rep(reads[feature == "anot"], table(gene)))
giving:
gene feature reads ratio
1 A anot 2 1.0000000
2 A 3ss_A 3 1.5000000
3 A 3ss_B 5 2.5000000
4 B 5ss_A 1 0.2500000
5 B anot 4 1.0000000
6 C 3ss_A 2 0.6666667
7 C 3ss_B 8 2.6666667
8 C anot 3 1.0000000
9 C 5ss_A 6 2.0000000
4) dplyr Using the dplyr package:
library(dplyr)
DF %>%
group_by(gene) %>%
mutate(ratio = reads / reads[feature == "anot"]) %>%
ungroup()
giving:
Source: local data frame [9 x 4]
gene feature reads ratio
(fctr) (fctr) (int) (dbl)
1 A anot 2 1.0000000
2 A 3ss_A 3 1.5000000
3 A 3ss_B 5 2.5000000
4 B 5ss_A 1 0.2500000
5 B anot 4 1.0000000
6 C 3ss_A 2 0.6666667
7 C 3ss_B 8 2.6666667
8 C anot 3 1.0000000
9 C 5ss_A 6 2.0000000
5) data.table Using the data.table package:
library(data.table)
DT <- as.data.table(DF)
DT[, ratio := reads / reads[feature == "anot"], by = "gene"]
giving:
> DT
gene feature reads ratio
1: A anot 2 1.0000000
2: A 3ss_A 3 1.5000000
3: A 3ss_B 5 2.5000000
4: B 5ss_A 1 0.2500000
5: B anot 4 1.0000000
6: C 3ss_A 2 0.6666667
7: C 3ss_B 8 2.6666667
8: C anot 3 1.0000000
9: C 5ss_A 6 2.0000000
Note: The input DF
in reproducible form is:
Lines <- "gene feature reads
A anot 2
A 3ss_A 3
A 3ss_B 5
B 5ss_A 1
B anot 4
C 3ss_A 2
C 3ss_B 8
C anot 3
C 5ss_A 6"
DF <- read.table(text = Lines, header = TRUE)
Upvotes: 11
Reputation: 7453
You can use, in base R:
df$ratio <- unlist(sapply(levels(df$gene),
function(l) with(subset(df, gene==l), reads / reads[feature=="anot"])))
gene feature reads ratio
1 A anot 2 1.0000000
2 A 3ss_A 3 1.5000000
3 A 3ss_B 5 2.5000000
4 B 5ss_A 1 0.2500000
5 B anot 4 1.0000000
6 C 3ss_A 2 0.6666667
7 C 3ss_B 8 2.6666667
8 C anot 3 1.0000000
9 C 5ss_A 6 2.0000000
It translates as: apply along the levels of gene
: subset df, divide reads
by the reads
value for feature==anot
. Then you unlist
the result and create a new column in your data.frame
.
But there is probably a shorter option.
Upvotes: 0
Reputation: 6784
You could try something like
anot_reads <- yourdata[yourdata$feature == "anot",]$reads
names(anot_reads) <- yourdata[yourdata$feature == "anot",]$gene
yourdata$ratio <- yourdata$reads / anot_reads[yourdata$gene]
Upvotes: 0