Use loop for multiple testing in data frame

Question

I would like to have a general function to perform multiple t.tests on data in a data frame with the following example data:

dat <- data.frame(ID=c(1:100),
                  DRUG= rep(c("D1","D2","D2","D3","D3","D3","D5","D1","D4","D2"),10),
                  ADR=rep(c("A1","A2","A3","A6","A7","A8","A4","A2","A1","A2"),10),
                  X= sample(1:250, 100, replace=F))

Basically, I want to run two t.tests for values of X for each unique combination of DRUG - ADR. If I take D1-A1 as an example, I want to test the X values for D1-A1 versus D1-A<>1 and the X values for D1-A1 versus D<>1-A1. Below is my syntax for this example, but my question is how to make a general loop / function to perform two tests for each unique combination of DRUG - ADR.

x <- ifelse (dat$DRUG == "D1" & dat$ADR == "A1",dat$X, NA)
x <- x[!is.na(x)]

y <- ifelse (dat$DRUG != "D1" & dat$ADR == "A1",dat$X, NA)
y <- y[!is.na(y)]

z <- ifelse (dat$DRUG == "D1" & dat$ADR != "A1",dat$X, NA)
z <- z[!is.na(z)]

t.test(x,y)
t.test(x,z)

So for record number 4 (D3-A6) the syntax would be:

x <- ifelse (dat$DRUG == "D3" & dat$ADR == "A6",dat$X, NA)
x <- x[!is.na(x)]

y <- ifelse (dat$DRUG != "D3" & dat$ADR == "A6",dat$X, NA)
y <- y[!is.na(y)]

z <- ifelse (dat$DRUG == "D3" & dat$ADR != "A6",dat$X, NA)
z <- z[!is.na(z)]

t.test(x,y)
t.test(x,z)

Anyone got a good idea for a general function?

EDIT: My ideal result would be the following table:

  Drug ADR pvalue1 pvalue2
1   D1  A1  pval11  pval21
2   D2  A2  pval12  pval22
3  D.. A.. pval1.. pval2..

Konrad Rudolph · Accepted Answer

As in every programming problem, the solution is in two steps:

Abstract your logic to make it general
Encapsulate the abstract solution into a reusable function

The you can proceed to

Call the function repeatedly on all data.

However, first off: the t-tests sometimes fail due to insufficient data; so let’s replace the t.test calls:

t_test = function (x, y, ...) {
    tryCatch(t.test(x, y, ...)$p.value, error = function (err) NA)
}

Then, all taken together, this gives us:

library(dplyr) # Makes data manipulation easier.

test_combination = function (data, id) {
    drug = data[id, ]$DRUG
    adr = data[id, ]$ADR

    match = filter(data, DRUG == drug, ADR == adr)$X
    mismatch1 = filter(data, DRUG != drug, ADR == adr)$X
    mismatch2 = filter(data, DRUG == drug, ADR != adr)$X

    list(pval1 = t_test(match, mismatch1), pval2 = t_test(match, mismatch2))
}

Which tests a single combination. Now we test all of them:

result = lapply(dat$ID, test_combination, data = dat) %>%
    bind_rows() %>%
    bind_cols(dat, .) %>%
    select(-X)

Or, using a more dplyr-like (but in my opinion somewhat obscure) approach:

result = dat %>%
    rowwise() %>%
    do(bind_rows(test_combination(dat, .$ID))) %>%
    bind_cols(dat, .) %>%
    select(-X)

Note how this code doesn’t use explicit for loops. This is how you process data in R: you apply a function to items in a table or list, rather than iterating manually.

Note that the above is highly questionable, statistically speaking. At the very least you need to perform rigorous multiple testing correction.

Use loop for multiple testing in data frame

Answers (1)

Related Questions