NamAshena
NamAshena

Reputation: 1747

How I can divide values in a column by specific row in R?

This is a subset of my large data:

gene    feature reads
A       anot    2
A       3ss_A   3
A       3ss_B   5
B       5ss_A   1
B       anot    4
C       3ss_A   2
C       3ss_B   8
C       anot    3
C       5ss_A   6

I want to divide reads corresponding to 3ss and 5ss features in each gene to feature "anot" of that gene. I have multiple features for each gene (not shown here) but each gene only has one "anot" feature.

expected output is:

gene    feature reads   ratio
A       anot    2       1
A       3ss_A   3       1.5
A       3ss_B   5       2.5
B       5ss_A   1       0.25
B       anot    4       1
C       3ss_A   2       0.666666667
C       3ss_B   8       2.666666667
C       anot    3       1
C       5ss_A   6       2

How I could do this in R? Thanks

Upvotes: 2

Views: 2615

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 270268

Here are a variety of alternatives:

1) ave Use ave like this. The function fun is passed the vector of row numbers for one gene and returns the vector of ratios for it. No packages are used.

fun <- function(ix) with(DF[ix, ], reads / reads[feature == "anot"])
transform(DF, ratio = ave(1:nrow(DF), gene, FUN = fun))

giving:

  gene feature reads     ratio
1    A    anot     2 1.0000000
2    A   3ss_A     3 1.5000000
3    A   3ss_B     5 2.5000000
4    B   5ss_A     1 0.2500000
5    B    anot     4 1.0000000
6    C   3ss_A     2 0.6666667
7    C   3ss_B     8 2.6666667
8    C    anot     3 1.0000000
9    C   5ss_A     6 2.0000000

1a) ave Here is another approach to using ave. It replaces each non-anot reading with NA and then in each gene it divides the readings by the non-NA using na.omit:

transform(DF, ratio = 
  reads / ave(ifelse(feature == "anot", reads, NA), gene, FUN = na.omit))

giving:

  gene feature reads     ratio
1    A    anot     2 1.0000000
2    A   3ss_A     3 1.5000000
3    A   3ss_B     5 2.5000000
4    B   5ss_A     1 0.2500000
5    B    anot     4 1.0000000
6    C   3ss_A     2 0.6666667
7    C   3ss_B     8 2.6666667
8    C    anot     3 1.0000000
9    C   5ss_A     6 2.0000000

1b) ave Here is another ave variation. This one is particularly concise but does assume that the reads value of anot is always non-negative (which is the case in the example in the question). It creates a vector equal to reads for anot and zero otherwise and then takes the maximum:

transform(DF, ratio = reads / ave((feature == "anot") * reads, gene, FUN = max))

giving:

  gene feature reads     ratio
1    A    anot     2 1.0000000
2    A   3ss_A     3 1.5000000
3    A   3ss_B     5 2.5000000
4    B   5ss_A     1 0.2500000
5    B    anot     4 1.0000000
6    C   3ss_A     2 0.6666667
7    C   3ss_B     8 2.6666667
8    C    anot     3 1.0000000
9    C   5ss_A     6 2.0000000

2) by An alternative, also not using any packages, is to use by. Here the function funby takes a subset of rows of DF and returns the subset with the ratio appended on.

funby <- function(x) transform(x, ratio = reads / reads[feature == "anot"])
do.call("rbind", by(DF, DF$gene, funby))

giving:

    gene feature reads     ratio
A.1    A    anot     2 1.0000000
A.2    A   3ss_A     3 1.5000000
A.3    A   3ss_B     5 2.5000000
B.4    B   5ss_A     1 0.2500000
B.5    B    anot     4 1.0000000
C.6    C   3ss_A     2 0.6666667
C.7    C   3ss_B     8 2.6666667
C.8    C    anot     3 1.0000000
C.9    C   5ss_A     6 2.0000000

3) rep/table This also uses no packages. It assumes that DF is sorted by gene (which is the case in the example in the question). It repeats each anot reading for the number of rows in that gene and then divides reads by that.

transform(DF, ratio = reads / rep(reads[feature == "anot"], table(gene)))

giving:

  gene feature reads     ratio
1    A    anot     2 1.0000000
2    A   3ss_A     3 1.5000000
3    A   3ss_B     5 2.5000000
4    B   5ss_A     1 0.2500000
5    B    anot     4 1.0000000
6    C   3ss_A     2 0.6666667
7    C   3ss_B     8 2.6666667
8    C    anot     3 1.0000000
9    C   5ss_A     6 2.0000000

4) dplyr Using the dplyr package:

library(dplyr)

DF %>% 
   group_by(gene) %>% 
   mutate(ratio = reads / reads[feature == "anot"]) %>% 
   ungroup()

giving:

Source: local data frame [9 x 4]

    gene feature reads     ratio
  (fctr)  (fctr) (int)     (dbl)
1      A    anot     2 1.0000000
2      A   3ss_A     3 1.5000000
3      A   3ss_B     5 2.5000000
4      B   5ss_A     1 0.2500000
5      B    anot     4 1.0000000
6      C   3ss_A     2 0.6666667
7      C   3ss_B     8 2.6666667
8      C    anot     3 1.0000000
9      C   5ss_A     6 2.0000000

5) data.table Using the data.table package:

library(data.table)

DT <- as.data.table(DF)
DT[, ratio := reads / reads[feature == "anot"], by = "gene"]

giving:

> DT
   gene feature reads     ratio
1:    A    anot     2 1.0000000
2:    A   3ss_A     3 1.5000000
3:    A   3ss_B     5 2.5000000
4:    B   5ss_A     1 0.2500000
5:    B    anot     4 1.0000000
6:    C   3ss_A     2 0.6666667
7:    C   3ss_B     8 2.6666667
8:    C    anot     3 1.0000000
9:    C   5ss_A     6 2.0000000

Note: The input DF in reproducible form is:

Lines <- "gene    feature reads
A       anot    2
A       3ss_A   3
A       3ss_B   5
B       5ss_A   1
B       anot    4
C       3ss_A   2
C       3ss_B   8
C       anot    3
C       5ss_A   6"
DF <- read.table(text = Lines, header = TRUE)

Upvotes: 11

Vincent Bonhomme
Vincent Bonhomme

Reputation: 7453

You can use, in base R:

df$ratio <- unlist(sapply(levels(df$gene),
    function(l) with(subset(df, gene==l), reads / reads[feature=="anot"])))

gene feature reads     ratio
1    A    anot     2 1.0000000
2    A   3ss_A     3 1.5000000
3    A   3ss_B     5 2.5000000
4    B   5ss_A     1 0.2500000
5    B    anot     4 1.0000000
6    C   3ss_A     2 0.6666667
7    C   3ss_B     8 2.6666667
8    C    anot     3 1.0000000
9    C   5ss_A     6 2.0000000

It translates as: apply along the levels of gene : subset df, divide reads by the reads value for feature==anot. Then you unlist the result and create a new column in your data.frame.

But there is probably a shorter option.

Upvotes: 0

Henry
Henry

Reputation: 6784

You could try something like

anot_reads        <- yourdata[yourdata$feature == "anot",]$reads
names(anot_reads) <- yourdata[yourdata$feature == "anot",]$gene
yourdata$ratio    <- yourdata$reads / anot_reads[yourdata$gene]

Upvotes: 0

Related Questions