Colin
Colin

Reputation: 31

R - how can I summarize other columns based on the value in one column

I have a file and the first a few lines are:

                  bacttaxa LL8388  UL8388  LL8384  LL8381  UL8382  LL8385
13603   Yokenella regensburgei      0   0.000   0.000   0.000   0.000  76.192
15068   Yokenella regensburgei      0   0.000   0.000 399.583   0.000   0.000
11518 Zobellia galactanivorans      0  83.133 200.795  79.862  90.273  29.303
19706 Zobellia galactanivorans      0 327.694   0.000 605.251 214.366 453.391
608      Zunongwangia profunda      0   0.000   0.000   0.000   0.000  96.438
3159     Zunongwangia profunda      0  14.865  23.004  28.628  11.166  53.613

How can I get the sum of the other columns based on the same value in the first column, so I will get the sum for each bacteria taxonomy? Any idea? Thank you!

Upvotes: 2

Views: 722

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193677

As mentioned in the comments, this is an "aggregation" question. As such, an obvious choice is the aggregate function in base R:

aggregate(. ~ bacttaxa, x, sum)
#                   bacttaxa LL8388  UL8388  LL8384  LL8381  UL8382  LL8385
# 1   Yokenella regensburgei      0   0.000   0.000 399.583   0.000  76.192
# 2 Zobellia galactanivorans      0 410.827 200.795 685.113 304.639 482.694
# 3    Zunongwangia profunda      0  14.865  23.004  28.628  11.166 150.051

You can also explore the "data.table" and "dplyr" packages.

## A data.table approach
library(data.table)
as.data.table(x)[, lapply(.SD, sum), by = bacttaxa]

## A dplyr approach
library(dplyr)
x %>% 
  group_by(bacttaxa) %>%
  summarise_each(funs(sum))

Upvotes: 3

Related Questions