Reputation: 13
I am having trouble conducting Wilcoxon test analyses on heavily tied data. I have outlined my problem as best I can below, how I have tried to address it, and the questions I have. I'd be really grateful for any advice anyone could give me.
My Problem I am working on a dataset where I need to compare three groups on a measure which was used for group assignment. When I run a one-way ANOVA, neither (1) the assumption of normality of residuals, nor (2) the assumption of homogeneity of variance of residuals is met.
I therefore used the Wilcoxon test to conduct pairwise comparisons in r with the following code (example for one comparison, two-sided alternative hypothesis as default):
measure ~ group, data= myreduceddataset, na.rm=TRUE, paired=FALSE, exact=TRUE, conf.int=TRUE
However, the output of my analysis looked strange to me (screenshot of example here), and gave up errors for every comparison (one example copied below):
Warning messages: 1: In wilcox.test.default(x = c(2, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, : cannot compute exact p-value with ties 2: In wilcox.test.default(x = c(2, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, : cannot compute exact confidence intervals with ties
Checking the data I then checked the data and looked at how the data are ranked in R to try to figure out the error. It seems as though, although there are some tied ranks throughout, the main problem is the number of 0 values in Group 1 here is some example raw and ranked data by group
The solution I found, and questions this raised From reading around, it appears that the solution to this could be to use the test from the 'Coin' package in R.
I had a go, and here is an example of my output. However, I am still not entirely clear on whether this is correct, and I have outlined some questions I still have below.
I'd be more than happy to give more detail if it would be helpful. Many thanks in advance.
UPDATE: Simulated data (same group means and SDs) here:
structure(list(measure = c(9, 15, 6, 7, 8, 7, 12, 5, 14, 9, 7,
13, 8, 14, 11, 16, 9, 7, 3, 8, 3, 21, 4, 3, 11, 13, 5, 7, 8,
15, 5, 15, 3, 9, 5, 2, 8, 6, 1, 1, 7, 6, 9, 5, 6, 2, 6, 10, 6,
6, 8, 6, 9, 8, 6, 2, 6, 2, 9, 5, 6, 4, 10, 7, 9, 8, 6, 4, 6,
14, 1, 12, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 2, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0), group = structure(c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", "2", "3"
), class = "factor")), row.names = c(NA, -122L), class = "data.frame")
Upvotes: 1
Views: 1822
Reputation: 78927
What you need is a Kruskal-Wallis-Test. The non-parametric pendant to ANOVA.
Edit:
library(dplyr)
library((ggpubr)
# group as factor
df$group <- as.factor(df$group)
# check for levels
levels(df$group)
# summarise with dplyr
group_by(df, group) %>%
summarise(
count = n(),
mean = mean(measure, na.rm = TRUE),
sd = sd(measure, na.rm = TRUE),
median = median(measure, na.rm = TRUE),
IQR = IQR(measure, na.rm = TRUE)
)
# Box Plot measure by group and color by group
library("ggpubr")
ggboxplot(df, x = "group", y = "measure",
color = "group", palette = c("#00AFBB", "#E7B800", "#FC4E07"),
order = c("1", "2", "3"),
ylab = "measure", xlab = "group")
# Mean Plot measure by group and color by group)
ggline(df, x = "group", y = "measure",
add = c("mean_se", "jitter"),
order = c("1", "2", "3"),
ylab = "measure", xlab = "group")
# kruskal test
kruskal.test(measure ~ group, data = df)
## output Kruskal-Wallis rank sum test
## data: measure by group
## Kruskal-Wallis chi-squared = 92.593, df = 2, p-value < 2.2e-16
### interpretation: There is a significant difference in the group means of group 1,2,3
# pairwise comparisons between group levels
pairwise.wilcox.test(df$measure, df$group,
p.adjust.method = "bonferroni")
## output: Pairwise comparisons using Wilcoxon rank sum test with continuity correction
#data: df$measure and df$group
# 1 2
# 2 4.2e-16 -
# 3 6.9e-16 0.013
# interpretation: The difference is significant between each group
Upvotes: 1