Reputation: 825
Here is the following code:
d <- structure(list(Gene = structure(1:3, .Label = c("k141_20041_1",
"k141_27047_2", "k141_70_3"), class = "factor"), phylum = structure(c(1L,
1L, 1L), .Label = "Firmicutes", class = "factor"), class = structure(c(1L,
1L, 1L), .Label = "Bacillales", class = "factor"), order = structure(c(1L,
1L, 1L), .Label = "Bacilli", class = "factor"), family = structure(c(1L,
1L, 1L), .Label = "Bacillaceae", class = "factor"), genus = structure(c(1L,
1L, 1L), .Label = "Bacillus", class = "factor"), species = structure(c(1L,
1L, 2L), .Label = c("Bacillus subtilis", "unknown"), class = "factor"),
SampleA = c(0, 0, 0), SampleB = c(0, 0, 0), SampleCtrl = c(3.98888888888889,
11.5555555555556, 3.35978835978836)), .Names = c("Gene",
"phylum", "class", "order", "family", "genus", "species", "SampleA",
"SampleB", "SampleCtrl"), row.names = c(21918L, 40410L, 40857L
), class = "data.frame")
Here is the output dataframe:
Gene phylum class order family genus species SampleA SampleB
k141_20041_1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis 0 0
k141_27047_2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis 0 0
k141_70_3 Firmicutes Bacillales Bacilli Bacillaceae Bacillus unknown 0 0
SampleCtrl
3.99
11.56
3.36
I'm summarizing as follows:
library(dplyr)
d%>%
group_by(phylum,class,order,family,genus, species)%>%
summarise_if(is.numeric, sum)
phylum class order family genus species SampleA SampleB SampleCtrl
<fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <dbl> <dbl> <dbl>
1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis 0 0 15.54444
2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus unknown 0 0 3.35979
I would like to add one column and concatenate the Genes that were summarized. For example it would look like this:
phylum class order family genus species SampleA SampleB SampleCtrl Gene
<fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <dbl> <dbl> <dbl>
1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis 0 0 15.54444 k141_20041_1,k141_27047_2
2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus unknown 0 0 3.35979 k141_70_3
Thanks for your help.
Upvotes: 2
Views: 124
Reputation: 16277
Basically, you want to use toString
to paste the genes and then group on the same columns, including the new Gene column so that summarise
includes it in the final table.
library(dplyr)
d%>%
group_by(phylum,class,order,family,genus, species)%>%
mutate(Gene=toString(Gene))%>%
group_by(phylum,class,order,family,genus, species,Gene)%>%
summarise_if(is.numeric, sum)
phylum class order family genus species Gene SampleA SampleB SampleCtrl
<fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <chr> <dbl> <dbl> <dbl>
1 Firmicutes Bacillales Bacilli Bacillaceae Bacillus Bacillus subtilis k141_20041_1, k141_27047_2 0 0 15.544444
2 Firmicutes Bacillales Bacilli Bacillaceae Bacillus unknown k141_70_3 0 0 3.359788
Upvotes: 1