Reputation: 13
I have a table I have read using the function read_csv. As an example, I have created a small data.frame to illustrate the structure.
In reality, the table is the count of 48160 bacterias (ASV_1 to ASV_48160 - rows) found in different samples (168 columns). The samples go from sample 1 - S1(a, b, c) - to sample 56 - S56(a, b, c) (a, b, and c are sub-samples).
My objective is to sum the a, b, c in every sample and get just one value. So, instead of me having S1a, S1b, S1c, I want to sum rows and have S1, keeping the same number of rows, and do the same with the 56 samples. Eventually, I want to keep just columns S1, S2, S3...S56 and erase the columns with a, b, c.
Bacteria <- data.frame(
ID=c("ASV_1", "ASV_2", "ASV_3", "ASV_4", "ASV_5", "ASV_6", "ASV_7", "ASV_8", "ASV_9", "ASV_10"),
s1a=c(135, 249, 142, 130, 20, 0, 0, 10, 85, 32),
s1b=c(14, 85, 0, 12, 0, 59, 0, 0, 96, 43),
s1c= c(0, 2, 8, 56, 145, 25, 65, 32, 25, 0),
s2a= c(54, 88, 65, 14, 0, 0, 4, 165, 87, 0),
s2b= c(33, 0, 74, 19, 22, 0, 5, 0, 3, 10),
s2c= c(178, 0, 1, 1, 0, 1, 66, 3, 0, 221),
s3a=c(135, 249, 12, 130, 20, 110, 0, 10, 55, 32),
s3b= c(99, 88, 65, 58, 0, 0, 4, 5, 87, 0),
s3c=c(90, 54, 6, 8, 0, 11, 91, 1,9, 0))
str(Bacteria)
I have tried some options with mutate
from tidyverse, but that works for one sample (or at least I don't know how to repeat the operation for the 56 samples), and I don't want to do it manually from 1 to 56
Bacteria <- Bacteria %>%
mutate(s1=s1a+s1b+s1c)
I have also tried rowSums
, but again I don't know how to repeat this line of code for the other samples. (I am a beginner in R).
Please, Does anyone know a better option? I appreciate the help!
Upvotes: 1
Views: 93
Reputation: 34441
One approach would be to pivot the data to long form, remove the subsample identifier and then pivot back to wide using the values_fn
argument to sum the common values.
library(dplyr)
library(tidyr)
Bacteria %>%
pivot_longer(-ID) %>%
mutate(name = sub("[a-c]$", "", name)) %>%
pivot_wider(values_from = value, names_from = name, values_fn = list(value = sum))
# A tibble: 10 x 4
ID s1 s2 s3
<chr> <dbl> <dbl> <dbl>
1 ASV_1 149 265 324
2 ASV_2 336 88 391
3 ASV_3 150 140 83
4 ASV_4 198 34 196
5 ASV_5 165 22 20
6 ASV_6 84 1 121
7 ASV_7 65 75 95
8 ASV_8 42 168 16
9 ASV_9 206 90 151
10 ASV_10 75 231 32
Upvotes: 1