Zcjth84
Zcjth84

Reputation: 13

Wilcoxon signed rank test for heavily tied data

I am having trouble conducting Wilcoxon test analyses on heavily tied data. I have outlined my problem as best I can below, how I have tried to address it, and the questions I have. I'd be really grateful for any advice anyone could give me.

My Problem I am working on a dataset where I need to compare three groups on a measure which was used for group assignment. When I run a one-way ANOVA, neither (1) the assumption of normality of residuals, nor (2) the assumption of homogeneity of variance of residuals is met.

I therefore used the Wilcoxon test to conduct pairwise comparisons in r with the following code (example for one comparison, two-sided alternative hypothesis as default):

measure ~ group, data= myreduceddataset, na.rm=TRUE, paired=FALSE, exact=TRUE, conf.int=TRUE

However, the output of my analysis looked strange to me (screenshot of example here), and gave up errors for every comparison (one example copied below):

Warning messages: 1: In wilcox.test.default(x = c(2, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, : cannot compute exact p-value with ties 2: In wilcox.test.default(x = c(2, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, : cannot compute exact confidence intervals with ties

Checking the data I then checked the data and looked at how the data are ranked in R to try to figure out the error. It seems as though, although there are some tied ranks throughout, the main problem is the number of 0 values in Group 1 here is some example raw and ranked data by group

The solution I found, and questions this raised From reading around, it appears that the solution to this could be to use the test from the 'Coin' package in R.

I had a go, and here is an example of my output. However, I am still not entirely clear on whether this is correct, and I have outlined some questions I still have below.

  1. I am not sure if an asymptotic test or an exact test is more appropriate for this dataset (the output appears to be the same)
  2. I am assuming I should use the coin::wilcox_test() not the coin::wilcoxsign_test(), as I am comparing samples from independent groups. Is this correct?
  3. If I am understanding correctly, the 'Z' value is the effect size. How do I derive the W statistic? Or can I just report the effect size?
  4. I am not sure how to correct this output for multiple comparisons

I'd be more than happy to give more detail if it would be helpful. Many thanks in advance.

UPDATE: Simulated data (same group means and SDs) here:

structure(list(measure = c(9, 15, 6, 7, 8, 7, 12, 5, 14, 9, 7, 
13, 8, 14, 11, 16, 9, 7, 3, 8, 3, 21, 4, 3, 11, 13, 5, 7, 8, 
15, 5, 15, 3, 9, 5, 2, 8, 6, 1, 1, 7, 6, 9, 5, 6, 2, 6, 10, 6, 
6, 8, 6, 9, 8, 6, 2, 6, 2, 9, 5, 6, 4, 10, 7, 9, 8, 6, 4, 6, 
14, 1, 12, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 1, 1, 2, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0), group = structure(c(3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", "2", "3"
), class = "factor")), row.names = c(NA, -122L), class = "data.frame")

Upvotes: 1

Views: 1822

Answers (1)

TarJae
TarJae

Reputation: 78927

What you need is a Kruskal-Wallis-Test. The non-parametric pendant to ANOVA.

Edit:

library(dplyr)
library((ggpubr)
# group as factor
df$group <- as.factor(df$group)
# check for levels
levels(df$group)
# summarise with dplyr
group_by(df, group) %>%
  summarise(
    count = n(),
    mean = mean(measure, na.rm = TRUE),
    sd = sd(measure, na.rm = TRUE),
    median = median(measure, na.rm = TRUE),
    IQR = IQR(measure, na.rm = TRUE)
  )
# Box Plot measure by group and color by group
library("ggpubr")
ggboxplot(df, x = "group", y = "measure", 
          color = "group", palette = c("#00AFBB", "#E7B800", "#FC4E07"),
          order = c("1", "2", "3"),
          ylab = "measure", xlab = "group")

# Mean Plot measure by group and color by group)
ggline(df, x = "group", y = "measure", 
       add = c("mean_se", "jitter"), 
       order = c("1", "2", "3"),
       ylab = "measure", xlab = "group")
# kruskal test
kruskal.test(measure ~ group, data = df)

## output   Kruskal-Wallis rank sum test

## data:  measure by group
## Kruskal-Wallis chi-squared = 92.593, df = 2, p-value < 2.2e-16

### interpretation: There is a significant difference in the group means of group 1,2,3



# pairwise comparisons between group levels
pairwise.wilcox.test(df$measure, df$group,
                     p.adjust.method = "bonferroni")

## output:  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 

#data:  df$measure and df$group 

#   1       2    
#   2 4.2e-16 -    
#   3 6.9e-16 0.013

# interpretation: The difference is significant between each group

enter image description here

enter image description here

Upvotes: 1

Related Questions