J.Q
J.Q

Reputation: 1031

R - NA handing in a mann-whitney u test

I need to compute weighted Mann Whitney U test results a few hundred times. Each iteration involves is a two-sample test for differences between two groups. I can't figure out how to get the existing function to handle missing values without dynamically deleting cases.

The data for a few of the comparisons are here, in a data frame I call dat. All variables with numbers in this sheet are numeric in type.

Here's how I call the sjstats::mannwhitney() function:

mannwhitney(dat, measure1, group)

When I do so, I get the following error:

Error in `[[<-.data.frame`(`*tmp*`, "grp1.label", value = character(0)) : 
  replacement has 0 rows, data has 1

I suspect this is because of the missing value in the 212th observation of measure1. But wrapping the vector names in na.omit() or !is.na() don't address the problem, perhaps because doing so still results in a data frame where the number of non-NA values of group are greater than the number of non-NA values in measure1.

Any thoughts on how I could incorporate dynamic NA handling into the function call?

Upvotes: 1

Views: 458

Answers (1)

StupidWolf
StupidWolf

Reputation: 46958

I am not sure what class your group column is, but if I do it like this:

library(sjstats)
dat = read.csv("question - Sheet1.csv")
str(dat)

'data.frame':   301 obs. of  5 variables:
 $ measure1 : num  2 1.6 2.2 2.7 1.8 1.8 4 4 3.9 -3.7 ...
 $ measure2 : num  0.9 0.1 0 0.4 -1 -1.3 2.1 0 -1.1 -3.9 ...
 $ measure3 : num  1.1 1.1 2.2 1.2 1.9 1.2 0 3 1.9 -3.8 ...
 $ measurre4: num  2 2 2 3 3 2 3 4 3 2.36 ...
 $ group    : int  0 0 0 0 0 0 0 0 0 0 ...

I get:

mannwhitney(dat, measure1, group)
Error in `[[<-.data.frame`(`*tmp*`, "grp1.label", value = character(0)) : 
  replacement has 0 rows, data has 1

Factor your group:

dat$group = factor(dat$group)
mannwhitney(dat, measure1, group)

# Mann-Whitney-U-Test

Groups 1 = 0 (n = 110) | 2 = 1 (n = 190):
  U = 16913.000, W = 10808.000, p = 0.621, Z = 0.495
  effect-size r =   0.029
   rank-mean(1) = 153.75
   rank-mean(2) = 148.62

Reading the code, the bug comes from this:

labels <- sjlabelled::get_labels(grp, attr.only = F, values = NULL, 
        non.labelled = T)

If your group is numeric, it doesn't have attributes and hence you get no labels:

sjlabelled::get_labels(0:1)
NULL

sjlabelled::get_labels(factor(0:1))
[1] "0" "1"

Upvotes: 1

Related Questions