Tom
Tom

Reputation: 136

Working out relative abundances with dplyr

I have my data:

library(dplyr)
Sample.no <- c(1,1,1,2,2,1,1,1,1,2,2)
Group <-c('a','b','c','a','b','a','b','c','d','a','c')
Abundance <- c(Sample.no*c(3,1,4,7,2))
df<-data.frame(Sample.no,Group,Abundance)

giving

Sample.no Group Abundance
1   1        a       3
2   1        b       1
3   1        c       4
4   2        a       14
5   2        b       4
6   1        a       3
7   1        b       1
8   1        c       4
9   1        d       7
10  2        a       4
11  2        c       6

I want to create a summary simmilar to this:

df<-group_by(df,Sample.no)
df<-summarise(df,number=n(),total=sum(Abundance))

Sample.no   number  total
  1   1          7       23

  2   2          4       28

however i'd also like a column with the total Abundance of 'a's in each sample in order to work out relative abundance. I've tried custom functions with no success, is there an easy way to do it in dplyr?

Upvotes: 0

Views: 2138

Answers (2)

mpalanco
mpalanco

Reputation: 13570

Using aggregate and xtabs:

total <- aggregate(Abundance ~ Sample.no, data=df, 
                   FUN = function(x) c(num = length(x), total = sum(x)))
group <- as.data.frame.matrix(xtabs(Abundance ~ Sample.no + Group, df))
cbind(total, group)

Output:

  Sample.no Abundance.num Abundance.total  a b c d
1         1             7              23  6 2 8 7
2         2             4              28 18 4 6 0

Upvotes: 0

Arun
Arun

Reputation: 118849

Here's one way using data.table:

require(data.table) # v1.9.6
setDT(df)[, c(list(num = .N, tot = sum(Abundance)), 
                   tapply(Abundance, Group, sum)), 
            by = Sample.no]
#    Sample.no num tot  a b c  d
# 1:         1   7  23  6 2 8  7
# 2:         2   4  28 18 4 6 NA

I use tapply() instead of joins using .SD since we need a named list here, and tapply()'s output format makes is very convenient.

Upvotes: 1

Related Questions