vams
vams

Reputation: 13

How can I sum consecutive columns nth times in a data.frame R

I have a table I have read using the function read_csv. As an example, I have created a small data.frame to illustrate the structure.

In reality, the table is the count of 48160 bacterias (ASV_1 to ASV_48160 - rows) found in different samples (168 columns). The samples go from sample 1 - S1(a, b, c) - to sample 56 - S56(a, b, c) (a, b, and c are sub-samples).

My objective is to sum the a, b, c in every sample and get just one value. So, instead of me having  S1a, S1b, S1c, I want to sum rows and have S1, keeping the same number of rows, and do the same with the 56 samples. Eventually, I want to keep just columns S1, S2, S3...S56 and erase the columns with a, b, c. 

Bacteria <- data.frame(
  ID=c("ASV_1", "ASV_2", "ASV_3", "ASV_4", "ASV_5", "ASV_6", "ASV_7", "ASV_8", "ASV_9", "ASV_10"),
  s1a=c(135, 249, 142, 130, 20, 0, 0, 10, 85, 32),
  s1b=c(14, 85, 0, 12, 0, 59, 0, 0, 96, 43),
  s1c= c(0, 2, 8, 56, 145, 25, 65, 32, 25, 0),
  s2a= c(54, 88, 65, 14, 0, 0, 4, 165, 87, 0),
  s2b= c(33, 0, 74, 19, 22, 0, 5, 0, 3, 10),
  s2c= c(178, 0, 1, 1, 0, 1, 66, 3, 0, 221),
  s3a=c(135, 249, 12, 130, 20, 110, 0, 10, 55, 32),
  s3b= c(99, 88, 65, 58, 0, 0, 4, 5, 87, 0),
  s3c=c(90, 54, 6, 8, 0, 11, 91, 1,9, 0))
str(Bacteria)

I have tried some options with mutate from tidyverse, but that works for one sample (or at least I don't know how to repeat the operation for the 56 samples), and I don't want to do it manually from 1 to 56

    Bacteria <- Bacteria %>% 
      mutate(s1=s1a+s1b+s1c)

I have also tried rowSums, but again I don't know how to repeat this line of code for the other samples. (I am a beginner in R). Please, Does anyone know a better option? I appreciate the help!

Upvotes: 1

Views: 93

Answers (1)

lroha
lroha

Reputation: 34441

One approach would be to pivot the data to long form, remove the subsample identifier and then pivot back to wide using the values_fn argument to sum the common values.

library(dplyr)
library(tidyr)

Bacteria %>%
  pivot_longer(-ID) %>%
  mutate(name = sub("[a-c]$", "", name)) %>%
  pivot_wider(values_from = value, names_from = name, values_fn = list(value = sum))

# A tibble: 10 x 4
   ID        s1    s2    s3
   <chr>  <dbl> <dbl> <dbl>
 1 ASV_1    149   265   324
 2 ASV_2    336    88   391
 3 ASV_3    150   140    83
 4 ASV_4    198    34   196
 5 ASV_5    165    22    20
 6 ASV_6     84     1   121
 7 ASV_7     65    75    95
 8 ASV_8     42   168    16
 9 ASV_9    206    90   151
10 ASV_10    75   231    32

Upvotes: 1

Related Questions