lazer-guided-lazerbeam
lazer-guided-lazerbeam

Reputation: 254

R - Cleanest way to run statistical test on every permutation of multiple populations

I have three populations stored as individual vectors. I need to run a statistical test (wilcoxon, if it matters) on each pair of these three populations.

I want to input three vectors into some block of code and get as output a vector of 6 p-values (one p-value is the result of one test and is a double).

I have a method that works but I am new to R and from what I've been reading I feel like there should be a better way, possibly involving storing the vectors as a data frame and using vectorization, to write this code.

Here is the code I have:

library(arrangements)

runAllTests <- function(pop1,pop2,pop3) {
    populations <- list(pop1=pop1,pop2=pop2,pop3=pop3)
    colLabels <- c("pop1", "pop2", "pop3")

    #This line makes a data frame where each column is a pair of labels
    perms <- data.frame(t(permutations(colLabels,2)))

    pvals <- vector()

    #This for loop gets each column of that data frame
    for (pair in perms[,]) {
        pair <- as.vector(pair)
        p1 <- as.numeric(unlist(populations[pair[1]]))
        p2 <- as.numeric(unlist(populations[pair[2]]))

        pvals <- append(pvals, wilcox.test(p1, p2,alternative=c("less"))$p.value)
    }

    return(pvals)
}

What is a more R appropriate way to write this code?

Note: Generating populations and comparing them all to each other is a common enough thing (and tricky enough to code) that I think this question will apply to more people than myself.

EDIT: I forgot that my actual populations are of different sizes. This means I cannot make a data frame out of the vectors (as far as I know). I can make a list of vectors though. I have updated my code with a version that works.

Upvotes: 3

Views: 253

Answers (2)

Aaron - mostly inactive
Aaron - mostly inactive

Reputation: 37754

Yes, this is indeed common; indeed so common that R has a built-in function for exactly this scenario: pairwise.table.

p <- list(pop1, pop2, pop3)
pairwise.table(function(i, j) { 
   wilcox.test(p[[i]], p[[j]])$p.value 
}, 1:3)

There are also specific versions for t tests, proportion tests, and Wilcoxon tests; here's an example using pairwise.wilcox.test.

p <- list(pop1, pop2, pop3)
d <- data.frame(x=unlist(p), g=rep(seq_along(p), sapply(p, length)))
with(d, pairwise.wilcox.test(x, g))

Also, make sure you look into the p.adjust.method parameter to correctly adjust for multiple comparisons.

Per your comments, you're interested in tests where the order matters; that's really hard to imagine (and isn't true for the Wilcoxon test you mentioned) but still...

This is the pairwise.table function, edited to do tests in both directions.

pairwise.table.all <- function (compare.levels, level.names, p.adjust.method) {
  ix <- setNames(seq_along(level.names), level.names)
  pp <- outer(ix, ix, function(ivec, jvec) 
    sapply(seq_along(ivec), function(k) {
             i <- ivec[k]; j <- jvec[k]
             if (i != j) compare.levels(i, j) else NA }))
  pp[] <- p.adjust(pp[], p.adjust.method)
  pp
}

This is a version of pairwise.wilcox.test which uses the above function, and also runs on a list of vectors, instead of a data frame in long format.

pairwise.lazerbeam.test <- function(dat, p.adjust.method=p.adjust.methods) {
  p.adjust.method <- match.arg(p.adjust.method)
  level.names <- if(!is.null(names(dat))) names(dat) else seq_along(dat)
  PVAL <- pairwise.table.all(function(i, j) { 
    wilcox.test(dat[[i]], dat[[j]])$p.value 
  }, level.names, p.adjust.method = p.adjust.method)
  ans <- list(method = "Lazerbeam's special method", 
              data.name = paste(level.names, collapse=", "), 
              p.value = PVAL, p.adjust.method = p.adjust.method)
  class(ans) <- "pairwise.htest"
  ans
}

Output, both before and after tidying, looks like this:

> p <- list(a=1:5, b=2:8, c=10:16)
> out <- pairwise.lazerbeam.test(p)

> out

    Pairwise comparisons using Lazerbeams special method 

data:  a, b, c 

  a      b      c     
a -      0.2821 0.0101
b 0.2821 -      0.0035
c 0.0101 0.0035 -     

P value adjustment method: holm 

> pairwise.lazerbeam.test(p) %>% broom::tidy()
# A tibble: 6 x 3
  group1 group2 p.value
  <chr>  <chr>    <dbl>
1 b      a      0.282  
2 c      a      0.0101 
3 a      b      0.282  
4 c      b      0.00350
5 a      c      0.0101 
6 b      c      0.00350

Upvotes: 3

lroha
lroha

Reputation: 34406

Here is an example of one approach that uses combn() which has a function argument that can be used to easily apply wilcox.test() to all variable combinations.

set.seed(234)

# Create dummy data
df <- data.frame(replicate(3, sample(1:5, 100, replace = TRUE)))

# Apply wilcox.test to all combinations of variables in data frame.
res <- combn(names(df), 2, function(x) list(data = c(paste(x[1], x[2])), p = wilcox.test(x = df[[x[1]]], y =  df[[x[2]]])$p.value), simplify = FALSE)

# Bind results
do.call(rbind, res) 

     data    p         
[1,] "X1 X2" 0.45282   
[2,] "X1 X3" 0.06095539
[3,] "X2 X3" 0.3162251 

Upvotes: 2

Related Questions