Reputation: 251
I'm trying to calculate relative abundances based based on row labels or names (get relative abundance for each test in df$path1
. So I'd like to calculate the relative abundance of counts from test1
, and calculate relative abundance of counts from test2
separately. The sum of the relative abundance numbers from test1
would equal 1.
I'm currently using the vegan
package, but open to other options.
Test dataset:
library(vegan)
df <- data.frame(x = c("a", "b", "c", "d", "e"),
path1 = c("test1", "test1", "test2", "test2", "test3"),
value = c(40, 10, 34, 12, 20))
df$relabun <- decostand(df[3], 2, method = "total") #takes relative abundace of whole column
Ideal output for relative abundance based on df$path1
, would look like this:
x path1 relabun_bypath1
a test1 0.8
b test1 0.2
c test2 0.74
d test2 0.26
e test3 1
Upvotes: 1
Views: 14502
Reputation: 43334
This is a classic split–apply–combine question. The most literal way in base R is to
split
, *apply
, and do.call(rbind, ... )
or unlist
.so
unlist(lapply(split(df, df$path1), function(x){x$value / sum(x$value)}))
# test11 test12 test21 test22 test3
# 0.8000000 0.2000000 0.7391304 0.2608696 1.0000000
which we can assign to a new variable. However, base has a nice if oddly-named function called ave
which can apply a function across groups for us:
ave(df$value, df$path1, FUN = function(x){x / sum(x)})
# [1] 0.8000000 0.2000000 0.7391304 0.2608696 1.0000000
which is a good deal more concise, and can likewise be assigned to a new variable.
If you prefer the Hadleyverse, dplyr
's grouping can make the process more readable:
library(dplyr)
df %>% group_by(path1) %>% mutate(relAbundByPath = value / sum(value))
# Source: local data frame [5 x 4]
# Groups: path1 [3]
#
# x path1 value relAbundByPath
# (fctr) (fctr) (dbl) (dbl)
# 1 a test1 40 0.8000000
# 2 b test1 10 0.2000000
# 3 c test2 34 0.7391304
# 4 d test2 12 0.2608696
# 5 e test3 20 1.0000000
As you can see, it returns a new version of the data.frame, which we can use to overwrite the existing one or make a new copy.
Whichever route you choose, get comfortable with the logic, because you'll likely use it a lot. Better, learn all of them. And tapply
and mapply
/Map
. And data.table
...why not?
Note: You can also replace the value / sum(value))
construct with the prop.table
function if you like. It's more concise (e.g. ave(df$value, df$path1, FUN = prop.table)
), but less obvious what it's doing, which is why I didn't use it here.
Upvotes: 3