Geli
Geli

Reputation: 75

Different P-values between wilcox_test() & wilcox.test() in R

While I was trying to perform a paired Wilcoxon test for the variable "Metabolite", I noticed different p-values between wilcox_test() & wilcox.test() when I tested the following variable:

structure(list(Visit = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L), .Label = c("BSL", "LV"), class = "factor"), Metabolite = c(NA, 
9.602, 9.0102, 4.524, 3.75, NA, 6.596, 7.065, 6.877, NA, NA, 
10.1846, 13.521, 7.8219, NA, 4.9149, 4.0754, 4.7635, 8.8554, 
4.3442, NA, 16.659, NA, 3.698, 6.623, 5.158, 11.719, 3.206, NA, 
2.225, 7.417, 1.42, NA, NA, 2.752, 6.504, 7.594, 6.652, NA, NA, 
3.784, 2.7311, 4.1749, 2.6659, 0.5592, NA, 4.2326, 4.3808, 3.624, 
4.29, 7.098, 6.532, 3.699, 9.297, 8.275, NA)), class = "data.frame", row.names = c(NA, 
-56L))



# The p-value derived from result 1 (p=0.0079) is different from that from result 2 (p=0.003279):


        result1 <- wilcox_test(data=Data_pairs,  Metabolite~Visit, paired = TRUE)
        
        # A tibble: 1 x 7
        #.y.     group1 group2    n1    n2 statistic      p
        #* <chr>   <chr>  <chr>  <int> <int>     <dbl>  <dbl>
        #        1 Metabolite BSL    LV        28    28       131 0.0079
        
        
        result2 <- wilcox.test(data=Data_pairs,  Metabolite~Visit, paired = TRUE)
        #   Wilcoxon signed rank exact test
        
        #data:  Metabolite by Visit
        #V = 197, p-value = 0.003279
        #alternative hypothesis: true location shift is not equal to 0
 

Using different statistical software, it seems that the wrong p-value is that derived from result2.

Is there any suggestion/advice on how to correct my code and what is the reason for this difference?

Thank you in advance for your help.

Upvotes: 2

Views: 1318

Answers (1)

StupidWolf
StupidWolf

Reputation: 46978

You should point out you are using rstatix to avoid the confusion:

rstatix::wilcox_test(data=Data_pairs,  Metabolite~Visit, paired = TRUE)
# A tibble: 1 x 7
  .y.        group1 group2    n1    n2 statistic      p
* <chr>      <chr>  <chr>  <int> <int>     <dbl>  <dbl>
1 Metabolite BSL    LV        28    28       131 0.0079

If it is indeed paired data, then you need to ensure you provide the complete pairs, taking your data, we can place them side by side, assuming the your observations 1 to 28 and from the same sample as 29 to 56:

da = do.call(cbind,split(Data_pairs$Metabolite,Data_pairs$Visit))
da

          BSL     LV
 [1,]      NA     NA
 [2,]  9.6020 2.2250
 [3,]  9.0102 7.4170
 [4,]  4.5240 1.4200
 [5,]  3.7500     NA
 [6,]      NA     NA
 [7,]  6.5960 2.7520
 [8,]  7.0650 6.5040
 [9,]  6.8770 7.5940
[10,]      NA 6.6520
[11,]      NA     NA
[12,] 10.1846     NA
[13,] 13.5210 3.7840
[14,]  7.8219 2.7311
[15,]      NA 4.1749
[16,]  4.9149 2.6659
[17,]  4.0754 0.5592
[18,]  4.7635     NA
[19,]  8.8554 4.2326
[20,]  4.3442 4.3808
[21,]      NA 3.6240
[22,] 16.6590 4.2900
[23,]      NA 7.0980
[24,]  3.6980 6.5320
[25,]  6.6230 3.6990
[26,]  5.1580 9.2970
[27,] 11.7190 8.2750
[28,]  3.2060     NA

We test only the complete:

wilcox.test(da[complete.cases(da),1],da[complete.cases(da),2],paired=TRUE)

data:  da[complete.cases(da), 1] and da[complete.cases(da), 2]
V = 131, p-value = 0.007904
alternative hypothesis: true location shift is not equal to 0

Upvotes: 2

Related Questions