awz1
awz1

Reputation: 449

Wilcox test on values in a single data frame column, based on condition from another column

I am trying to undertake the Wilcox test on a single column, which is conditioned on another column. I keep on getting an error, such as below:

Error in wilcox.test.default(mtcars %>% filter(am == 1) %>% select(mpg), : 'x' must be numeric

I have produced an example below using the mtcars dataset, and was wondering if someone could advise me on what I'm doing wrong

wilcox.test(mtcars%>%filter(am==1)%>%select(mpg),
            mtcars%>%filter(am==0)%>%select(mpg))

Is it linked to the variables being input into the test are of different lengths?

Upvotes: 1

Views: 789

Answers (1)

Chuck P
Chuck P

Reputation: 3923

While I agree with @Dave2e that using the formula interface would be much cleaner, if you want to use dplyr you're going to have to pull mpg not select it.

library(dplyr)


wilcox.test(mtcars %>% filter(am==1) %>% pull(mpg),
            mtcars %>% filter(am==0) %>% pull(mpg))
#> Warning in wilcox.test.default(mtcars %>% filter(am == 1) %>% pull(mpg), :
#> cannot compute exact p-value with ties
#> 
#>  Wilcoxon rank sum test with continuity correction
#> 
#> data:  mtcars %>% filter(am == 1) %>% pull(mpg) and mtcars %>% filter(am == 0) %>% pull(mpg)
#> W = 205, p-value = 0.001871
#> alternative hypothesis: true location shift is not equal to 0

wilcox.test(mtcars$mpg ~ mtcars$am)
#> Warning in wilcox.test.default(x = c(21.4, 18.7, 18.1, 14.3, 24.4, 22.8, :
#> cannot compute exact p-value with ties
#> 
#>  Wilcoxon rank sum test with continuity correction
#> 
#> data:  mtcars$mpg by mtcars$am
#> W = 42, p-value = 0.001871
#> alternative hypothesis: true location shift is not equal to 0

N.B. If you want the exact same results you have to reverse your filter order since the W statistic is calculated for the order given although p.value remains the same.

Upvotes: 1

Related Questions