Reputation: 31
I am comparing two groups of lengths (different individuals) with boxplots using ggplot2 package in R. I want to compare the two distributions but so far the only way I found to use a wilcoxon test is stat_compare_means from the "ggpubr" package. Is it the right way to compare the distributions? Can I compare the distribution and not the mean specifically? As you can see, I am a newby in the stat world. Thank you!
Upvotes: 3
Views: 7670
Reputation: 5747
Base R has a built-in function to do a Wilcoxon test: wilcox.test
. You can feed it two numeric vectors or a formula relating a numeric variable to a factor variable (with two levels).
# vector input
setosa_SL <- iris$Sepal.Length[which(iris$Species == "setosa")]
versicolor_SL <- iris$Sepal.Length[which(iris$Species == "versicolor")]
wilcox.test(setosa_SL, versicolor_SL)
Wilcoxon rank sum test with continuity correction
data: setosa_SL and versicolor_SL
W = 168.5, p-value = 8.346e-14
alternative hypothesis: true location shift is not equal to 0
# formula input
wilcox.test(Sepal.Length ~ Species, data = iris[which(iris$Species != "virginica"),])
Wilcoxon rank sum test with continuity correction
data: Sepal.Length by Species
W = 168.5, p-value = 8.346e-14
alternative hypothesis: true location shift is not equal to 0
However, iris$Species
has three levels. What if we wanted to do all three?
The base stats
package also has pairwise.wilcox.test
.
pairwise.wilcox.test(iris$Sepal.Length, iris$Species)
Pairwise comparisons using Wilcoxon rank sum test with continuity correction
data: iris$Sepal.Length and iris$Species
setosa versicolor
versicolor 1.7e-13 -
virginica < 2e-16 5.9e-07
P value adjustment method: holm
Now, I suspect you want to graph this. You need pairwise_wilcox_test
and add_xy_position
from the rstatix
package and stat_pvalue_manual
from the ggpubr
package. The pairwise_wilcox_test
function is an improvement over the base R pairwise.wilcox.text
since returns a tibble rather than a list of class htest
.
library(rtatix)
librarr(ggpubr)
iris %>% pairwise_wilcox_test(Sepal.Length ~ Species)
# A tibble: 3 x 9
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
* <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
1 Sepal.Length setosa versicolor 50 50 168. 8.35e-14 1.67e-13 ****
2 Sepal.Length setosa virginica 50 50 38.5 6.40e-17 1.92e-16 ****
3 Sepal.Length versicolor virginica 50 50 526 5.87e- 7 5.87e- 7 ****
The function add_xy_positions
adds x and y coordinate information to make this data more suitable for plotting, and stat_pvalue_manual
adds a layer containing the p-value information.
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
geom_boxplot() +
stat_pvalue_manual(iris %>%
pairwise_wilcox_test(Sepal.Length ~ Species) %>%
add_xy_position())
Upvotes: 6
Reputation: 78937
This info is preleminary:
If you want to test whether your data is normally distributed or not use Kolmogorov-Smirnov test.
If the data is normally distributed use t-test to compare the means of your two groups.
If the data is not normally distributed then use Wilcoxon rank sum test (= Mann Whitney U test) to compare the medians of the two groups.
dput()
your data and I can show you the code.
Upvotes: 0