Grace_G
Grace_G

Reputation: 53

Data analysis by R language. How to discribe the distribution of NA positon in a vector?

I hope the position distribution of NA is uniform in the vector (length = 30, NA < 6 ).

This one length is 30, 4 NA. It's easy to see these NA not uniform, mainly at left.

vector_x <- c(NA,3, NA, 1, NA, 5, 6, 7, 7, 9, 0, 2, 12, 324, 54,23, 12, 324, 122, 23, 324, 332, 45, 78, 32, 12, 342, 95, 67, NA)

But I have no idea about use which kind of statistic or test to discribe. Then I can quantitative screening by a cutoff.

Now, I have two preliminary thoughts.
To simplify the solution, all NA seemed as 0 and all number seemed as 1, to see the distribution.
Or I get the index of NA, to do variance analysis about c(1, 3, 5, 30)

Thanks for your any suggestions!

Upvotes: 0

Views: 99

Answers (2)

CPak
CPak

Reputation: 13591

You want to perform a Mann-Whitney U test or Wilcoxon rank-sum test (which is more descriptive of what it's doing)

This is easy to do with your data

which(is.na(v))
# [1]  1  3  5 30

which(!is.na(v))
# [1]  2  4  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

wilcox.test(which(is.na(v)), which(!is.na(v)))

        # Wilcoxon rank sum test

# data:  which(is.na(v)) and which(!is.na(v))
# W = 29, p-value = 0.1766
# alternative hypothesis: true location shift is not equal to 0

Check that wilcox.test works the way we expect with

wilcox.test(1:5, 6:10)  # low p value
wilcox.test(seq(1,10,2), seq(2,10,2)) # high p value

Upvotes: 2

akrun
akrun

Reputation: 887741

If we need the index of NA elements, use is.na to convert to a logical vector, then with which returns the numeric index where it is TRUE

which(is.na(vector_x))
#[1]  1  3  5 30

Or to convert to a binary vector where 0 represents NA and 1 for other values

as.integer(!is.na(vector_x))

Upvotes: 1

Related Questions