david
david

Reputation: 825

Summarize_if and concatenate ids that were sum

Here is the following code:

d <- structure(list(Gene = structure(1:3, .Label = c("k141_20041_1", 
    "k141_27047_2", "k141_70_3"), class = "factor"), phylum = structure(c(1L, 
    1L, 1L), .Label = "Firmicutes", class = "factor"), class = structure(c(1L, 
    1L, 1L), .Label = "Bacillales", class = "factor"), order = structure(c(1L, 
    1L, 1L), .Label = "Bacilli", class = "factor"), family = structure(c(1L, 
    1L, 1L), .Label = "Bacillaceae", class = "factor"), genus = structure(c(1L, 
    1L, 1L), .Label = "Bacillus", class = "factor"), species = structure(c(1L, 
    1L, 2L), .Label = c("Bacillus subtilis", "unknown"), class = "factor"), 
        SampleA = c(0, 0, 0), SampleB = c(0, 0, 0), SampleCtrl = c(3.98888888888889, 
        11.5555555555556, 3.35978835978836)), .Names = c("Gene", 
    "phylum", "class", "order", "family", "genus", "species", "SampleA", 
    "SampleB", "SampleCtrl"), row.names = c(21918L, 40410L, 40857L
    ), class = "data.frame")

Here is the output dataframe:

Gene     phylum      class   order      family    genus           species SampleA SampleB
k141_20041_1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0
k141_27047_2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0
k141_70_3 Firmicutes Bacillales Bacilli Bacillaceae Bacillus           unknown       0       0
  SampleCtrl
  3.99
 11.56
  3.36

I'm summarizing as follows:

library(dplyr)
d%>%
group_by(phylum,class,order,family,genus, species)%>%
summarise_if(is.numeric, sum)    

      phylum      class   order      family    genus           species SampleA SampleB SampleCtrl
      <fctr>     <fctr>  <fctr>      <fctr>   <fctr>            <fctr>   <dbl>   <dbl>      <dbl>
1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0   15.54444
2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus           unknown       0       0    3.35979

I would like to add one column and concatenate the Genes that were summarized. For example it would look like this:

    phylum      class   order      family    genus           species SampleA SampleB SampleCtrl Gene
      <fctr>     <fctr>  <fctr>      <fctr>   <fctr>            <fctr>   <dbl>   <dbl>      <dbl>
1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis       0       0   15.54444  k141_20041_1,k141_27047_2
2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus           unknown       0       0    3.35979 k141_70_3 

Thanks for your help.

Upvotes: 2

Views: 124

Answers (1)

Pierre Lapointe
Pierre Lapointe

Reputation: 16277

Basically, you want to use toString to paste the genes and then group on the same columns, including the new Gene column so that summarise includes it in the final table.

library(dplyr)
d%>%
  group_by(phylum,class,order,family,genus, species)%>%
  mutate(Gene=toString(Gene))%>%
  group_by(phylum,class,order,family,genus, species,Gene)%>%
  summarise_if(is.numeric, sum)   
      phylum      class   order      family    genus           species                       Gene SampleA SampleB SampleCtrl
      <fctr>     <fctr>  <fctr>      <fctr>   <fctr>            <fctr>                      <chr>   <dbl>   <dbl>      <dbl>
1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis k141_20041_1, k141_27047_2       0       0  15.544444
2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus           unknown                  k141_70_3       0       0   3.359788

Upvotes: 1

Related Questions