Reputation: 136
I have my data:
library(dplyr)
Sample.no <- c(1,1,1,2,2,1,1,1,1,2,2)
Group <-c('a','b','c','a','b','a','b','c','d','a','c')
Abundance <- c(Sample.no*c(3,1,4,7,2))
df<-data.frame(Sample.no,Group,Abundance)
giving
Sample.no Group Abundance
1 1 a 3
2 1 b 1
3 1 c 4
4 2 a 14
5 2 b 4
6 1 a 3
7 1 b 1
8 1 c 4
9 1 d 7
10 2 a 4
11 2 c 6
I want to create a summary simmilar to this:
df<-group_by(df,Sample.no)
df<-summarise(df,number=n(),total=sum(Abundance))
Sample.no number total
1 1 7 23
2 2 4 28
however i'd also like a column with the total Abundance of 'a's in each sample in order to work out relative abundance. I've tried custom functions with no success, is there an easy way to do it in dplyr?
Upvotes: 0
Views: 2138
Reputation: 13570
Using aggregate
and xtabs
:
total <- aggregate(Abundance ~ Sample.no, data=df,
FUN = function(x) c(num = length(x), total = sum(x)))
group <- as.data.frame.matrix(xtabs(Abundance ~ Sample.no + Group, df))
cbind(total, group)
Output:
Sample.no Abundance.num Abundance.total a b c d
1 1 7 23 6 2 8 7
2 2 4 28 18 4 6 0
Upvotes: 0
Reputation: 118849
Here's one way using data.table
:
require(data.table) # v1.9.6
setDT(df)[, c(list(num = .N, tot = sum(Abundance)),
tapply(Abundance, Group, sum)),
by = Sample.no]
# Sample.no num tot a b c d
# 1: 1 7 23 6 2 8 7
# 2: 2 4 28 18 4 6 NA
I use tapply()
instead of joins using .SD
since we need a named list here, and tapply()
's output format makes is very convenient.
Upvotes: 1