Reputation: 75
While I was trying to perform a paired Wilcoxon test for the variable "Metabolite", I noticed different p-values between wilcox_test()
& wilcox.test()
when I tested the following variable:
structure(list(Visit = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("BSL", "LV"), class = "factor"), Metabolite = c(NA,
9.602, 9.0102, 4.524, 3.75, NA, 6.596, 7.065, 6.877, NA, NA,
10.1846, 13.521, 7.8219, NA, 4.9149, 4.0754, 4.7635, 8.8554,
4.3442, NA, 16.659, NA, 3.698, 6.623, 5.158, 11.719, 3.206, NA,
2.225, 7.417, 1.42, NA, NA, 2.752, 6.504, 7.594, 6.652, NA, NA,
3.784, 2.7311, 4.1749, 2.6659, 0.5592, NA, 4.2326, 4.3808, 3.624,
4.29, 7.098, 6.532, 3.699, 9.297, 8.275, NA)), class = "data.frame", row.names = c(NA,
-56L))
# The p-value derived from result 1 (p=0.0079) is different from that from result 2 (p=0.003279):
result1 <- wilcox_test(data=Data_pairs, Metabolite~Visit, paired = TRUE)
# A tibble: 1 x 7
#.y. group1 group2 n1 n2 statistic p
#* <chr> <chr> <chr> <int> <int> <dbl> <dbl>
# 1 Metabolite BSL LV 28 28 131 0.0079
result2 <- wilcox.test(data=Data_pairs, Metabolite~Visit, paired = TRUE)
# Wilcoxon signed rank exact test
#data: Metabolite by Visit
#V = 197, p-value = 0.003279
#alternative hypothesis: true location shift is not equal to 0
Using different statistical software, it seems that the wrong p-value is that derived from result2
.
Is there any suggestion/advice on how to correct my code and what is the reason for this difference?
Thank you in advance for your help.
Upvotes: 2
Views: 1318
Reputation: 46978
You should point out you are using rstatix to avoid the confusion:
rstatix::wilcox_test(data=Data_pairs, Metabolite~Visit, paired = TRUE)
# A tibble: 1 x 7
.y. group1 group2 n1 n2 statistic p
* <chr> <chr> <chr> <int> <int> <dbl> <dbl>
1 Metabolite BSL LV 28 28 131 0.0079
If it is indeed paired data, then you need to ensure you provide the complete pairs, taking your data, we can place them side by side, assuming the your observations 1 to 28 and from the same sample as 29 to 56:
da = do.call(cbind,split(Data_pairs$Metabolite,Data_pairs$Visit))
da
BSL LV
[1,] NA NA
[2,] 9.6020 2.2250
[3,] 9.0102 7.4170
[4,] 4.5240 1.4200
[5,] 3.7500 NA
[6,] NA NA
[7,] 6.5960 2.7520
[8,] 7.0650 6.5040
[9,] 6.8770 7.5940
[10,] NA 6.6520
[11,] NA NA
[12,] 10.1846 NA
[13,] 13.5210 3.7840
[14,] 7.8219 2.7311
[15,] NA 4.1749
[16,] 4.9149 2.6659
[17,] 4.0754 0.5592
[18,] 4.7635 NA
[19,] 8.8554 4.2326
[20,] 4.3442 4.3808
[21,] NA 3.6240
[22,] 16.6590 4.2900
[23,] NA 7.0980
[24,] 3.6980 6.5320
[25,] 6.6230 3.6990
[26,] 5.1580 9.2970
[27,] 11.7190 8.2750
[28,] 3.2060 NA
We test only the complete:
wilcox.test(da[complete.cases(da),1],da[complete.cases(da),2],paired=TRUE)
data: da[complete.cases(da), 1] and da[complete.cases(da), 2]
V = 131, p-value = 0.007904
alternative hypothesis: true location shift is not equal to 0
Upvotes: 2