Reputation: 47
I'm writing a code that, from a table of raw data (output of microbiome QIIME analysis), generates a t-test per group of all the rows. Every row has a bacteria and the values corresponding for every sample. The table can be huge, like 80 columns per 400 rows.
Group_phylum_data:
label_Group Bacteria_Firmicutes Archaea_Other Archaea_Euryarchaeota Bacteria_Other
HC 6.771703e-05 0 0.000000000 9.480385e-04
HC 3.362588e-05 0 0.016835356 5.604313e-05
HC 0.000000e+00 0 0.000000000 2.209945e-04
EPI 0.000000e+00 0 0.001121252 2.466755e-04
EPI 0.000000e+00 0 0.000000000 3.335038e-04
So now these are just the first lines with 2 groups (HC and EPI). I want to run a t-test for each bacteria in the columns among the groups. I've found this pairwise_t_test from the rstatix package and it does exactly what I want, returning also the adjusted p-value. Since the groups can be more than 2, I chose this pairwise_t_test because it can handle them and perform the stats for every combination.
pwc1 <- Group_phylum_data %>%
pairwise_t_test(Bacteria_Firmicutes ~ label_Group, p.adjust.method = "bonferroni")
pwc1
The problem is that I can't find a way to make it a loop for entering each bacteria name and obtain a complete table with a bacteria per row and the stats in the corresponding columns, something like
.y. group1 group2 n1 n2 p p.signif p.adj p.adj.signif
<chr> <chr> <chr> <int> <int> <dbl> <chr> <dbl> <chr>
1 Bacteria_Firmicutes EPI HC 46 28 0.82 ns 0.82 ns
2 Archaea_Other EPI HC 46 28 0.453 ns 0.453 ns
which I obtained by manually performing the analysis inserting the bacteria names. I tried to save the names in an array and substitute the single name (in the example, "Bacteria_Firmicutes") with something like names[i] but it doesn't work. Maybe that's a limit of this script, which only works with a specific name... or maybe I did something wrong? Or, is there another and maybe better way to obtain the output I want for this long dataset? Thank you!
Upvotes: 0
Views: 706
Reputation: 39613
You can try this (Archaea_Other
is zero, so no output is produced). I hope this helped.
library(reshape2)
library(rstatix)
#Melt
Melted <- reshape2::melt(data,id.vars = 'label_Group')
#Stat test
pwc1 <- Melted %>% group_by(variable) %>%
pairwise_t_test(value ~ label_Group, p.adjust.method = "bonferroni")
# A tibble: 3 x 10
variable .y. group1 group2 n1 n2 p p.signif p.adj p.adj.signif
* <fct> <chr> <chr> <chr> <int> <int> <dbl> <chr> <dbl> <chr>
1 Bacteria_Firmicutes value EPI HC 2 3 0.273 ns 0.273 ns
2 Archaea_Euryarchaeota value EPI HC 2 3 0.536 ns 0.536 ns
3 Bacteria_Other value EPI HC 2 3 0.761 ns 0.761 ns
Data
data <- structure(list(label_Group = c("HC", "HC", "HC", "EPI", "EPI"
), Bacteria_Firmicutes = c(6.771703e-05, 3.362588e-05, 0, 0,
0), Archaea_Other = c(0L, 0L, 0L, 0L, 0L), Archaea_Euryarchaeota = c(0,
0.016835356, 0, 0.001121252, 0), Bacteria_Other = c(0.0009480385,
5.604313e-05, 0.0002209945, 0.0002466755, 0.0003335038)), class = "data.frame", row.names = c(NA,
-5L))
Upvotes: 0