Reputation: 31
I have a file and the first a few lines are:
bacttaxa LL8388 UL8388 LL8384 LL8381 UL8382 LL8385
13603 Yokenella regensburgei 0 0.000 0.000 0.000 0.000 76.192
15068 Yokenella regensburgei 0 0.000 0.000 399.583 0.000 0.000
11518 Zobellia galactanivorans 0 83.133 200.795 79.862 90.273 29.303
19706 Zobellia galactanivorans 0 327.694 0.000 605.251 214.366 453.391
608 Zunongwangia profunda 0 0.000 0.000 0.000 0.000 96.438
3159 Zunongwangia profunda 0 14.865 23.004 28.628 11.166 53.613
How can I get the sum of the other columns based on the same value in the first column, so I will get the sum for each bacteria taxonomy? Any idea? Thank you!
Upvotes: 2
Views: 722
Reputation: 193677
As mentioned in the comments, this is an "aggregation" question. As such, an obvious choice is the aggregate
function in base R:
aggregate(. ~ bacttaxa, x, sum)
# bacttaxa LL8388 UL8388 LL8384 LL8381 UL8382 LL8385
# 1 Yokenella regensburgei 0 0.000 0.000 399.583 0.000 76.192
# 2 Zobellia galactanivorans 0 410.827 200.795 685.113 304.639 482.694
# 3 Zunongwangia profunda 0 14.865 23.004 28.628 11.166 150.051
You can also explore the "data.table" and "dplyr" packages.
## A data.table approach
library(data.table)
as.data.table(x)[, lapply(.SD, sum), by = bacttaxa]
## A dplyr approach
library(dplyr)
x %>%
group_by(bacttaxa) %>%
summarise_each(funs(sum))
Upvotes: 3