How to transform raw counts in a table to percent relative abundance on R or bash?

Question

I'm trying to transform the counts in each cell of the below table (filename=abundance_table) to percent relative abundance by dividing the total sum of counts in each column (with the exception of the first) and then multiplying it by 100.

Taxa    Sample1    Sample2
Eukaryota;Alveolata;Apicomplexa 1000    500
Eukaryota;Alveolata;Dinophyceae 2000    500
Eukaryota;Alveolata;Unclassified Alveolata  500 1000
Eukaryota;Choanoflagellida;Acanthoecidae        500  1000
Eukaryota;Choanoflagellida;Codonosigidae        1000     2000

and I'm expecting an output table that would look exactly as below:

Taxa    Sample1    Sample2
Eukaryota;Alveolata;Apicomplexa 20  10
Eukaryota;Alveolata;Dinophyceae 40   10
Eukaryota;Alveolata;Unclassified Alveolata  10  20
Eukaryota;Choanoflagellida;Acanthoecidae        10  20
Eukaryota;Choanoflagellida;Codonosigidae        20     40

I'm new to R and I tried the below R code but it didn't give me the expected result. I would appreciate it very much if anyone could provide me the correct R code to do this or if there's an alternative simple solution on bash for this.

df <- read.table("abundance_table", header= TRUE, sep = "	")
sum= colSums(df[,-1])
norm = df[,-1] / sum*100

Ronak Shah · Accepted Answer

Here are 3 base R solutions :

#1.
df[-1] <-sweep(df[-1], 2, colSums(df[,-1]), `/`) * 100

#2.
df[-1] <- t(t(df[-1])/colSums(df[,-1])) * 100

#3.
df[-1] <- sapply(df[-1], prop.table) * 100

All of which return :

df
#                                       Taxa Sample1 Sample2
#1           Eukaryota;Alveolata;Apicomplexa      20      10
#2           Eukaryota;Alveolata;Dinophyceae      40      10
#3 Eukaryota;Alveolata;UnclassifiedAlveolata      10      20
#4  Eukaryota;Choanoflagellida;Acanthoecidae      10      20
#5  Eukaryota;Choanoflagellida;Codonosigidae      20      40

How to transform raw counts in a table to percent relative abundance on R or bash?

Answers (2)

Related Questions