Joep_S
Joep_S

Reputation: 537

Use loop for multiple testing in data frame

I would like to have a general function to perform multiple t.tests on data in a data frame with the following example data:

dat <- data.frame(ID=c(1:100),
                  DRUG= rep(c("D1","D2","D2","D3","D3","D3","D5","D1","D4","D2"),10),
                  ADR=rep(c("A1","A2","A3","A6","A7","A8","A4","A2","A1","A2"),10),
                  X= sample(1:250, 100, replace=F))

Basically, I want to run two t.tests for values of X for each unique combination of DRUG - ADR. If I take D1-A1 as an example, I want to test the X values for D1-A1 versus D1-A<>1 and the X values for D1-A1 versus D<>1-A1. Below is my syntax for this example, but my question is how to make a general loop / function to perform two tests for each unique combination of DRUG - ADR.

x <- ifelse (dat$DRUG == "D1" & dat$ADR == "A1",dat$X, NA)
x <- x[!is.na(x)]

y <- ifelse (dat$DRUG != "D1" & dat$ADR == "A1",dat$X, NA)
y <- y[!is.na(y)]

z <- ifelse (dat$DRUG == "D1" & dat$ADR != "A1",dat$X, NA)
z <- z[!is.na(z)]

t.test(x,y)
t.test(x,z)

So for record number 4 (D3-A6) the syntax would be:

x <- ifelse (dat$DRUG == "D3" & dat$ADR == "A6",dat$X, NA)
x <- x[!is.na(x)]

y <- ifelse (dat$DRUG != "D3" & dat$ADR == "A6",dat$X, NA)
y <- y[!is.na(y)]

z <- ifelse (dat$DRUG == "D3" & dat$ADR != "A6",dat$X, NA)
z <- z[!is.na(z)]

t.test(x,y)
t.test(x,z)

Anyone got a good idea for a general function?

EDIT: My ideal result would be the following table:

  Drug ADR pvalue1 pvalue2
1   D1  A1  pval11  pval21
2   D2  A2  pval12  pval22
3  D.. A.. pval1.. pval2..

Upvotes: 0

Views: 197

Answers (1)

Konrad Rudolph
Konrad Rudolph

Reputation: 546053

As in every programming problem, the solution is in two steps:

  1. Abstract your logic to make it general
  2. Encapsulate the abstract solution into a reusable function

The you can proceed to

  1. Call the function repeatedly on all data.

However, first off: the t-tests sometimes fail due to insufficient data; so let’s replace the t.test calls:

t_test = function (x, y, ...) {
    tryCatch(t.test(x, y, ...)$p.value, error = function (err) NA)
}

Then, all taken together, this gives us:

library(dplyr) # Makes data manipulation easier.

test_combination = function (data, id) {
    drug = data[id, ]$DRUG
    adr = data[id, ]$ADR

    match = filter(data, DRUG == drug, ADR == adr)$X
    mismatch1 = filter(data, DRUG != drug, ADR == adr)$X
    mismatch2 = filter(data, DRUG == drug, ADR != adr)$X

    list(pval1 = t_test(match, mismatch1), pval2 = t_test(match, mismatch2))
}

Which tests a single combination. Now we test all of them:

result = lapply(dat$ID, test_combination, data = dat) %>%
    bind_rows() %>%
    bind_cols(dat, .) %>%
    select(-X)

Or, using a more dplyr-like (but in my opinion somewhat obscure) approach:

result = dat %>%
    rowwise() %>%
    do(bind_rows(test_combination(dat, .$ID))) %>%
    bind_cols(dat, .) %>%
    select(-X)

Note how this code doesn’t use explicit for loops. This is how you process data in R: you apply a function to items in a table or list, rather than iterating manually.

Note that the above is highly questionable, statistically speaking. At the very least you need to perform rigorous multiple testing correction.

Upvotes: 1

Related Questions