Reputation: 537
I would like to have a general function to perform multiple t.tests on data in a data frame with the following example data:
dat <- data.frame(ID=c(1:100),
DRUG= rep(c("D1","D2","D2","D3","D3","D3","D5","D1","D4","D2"),10),
ADR=rep(c("A1","A2","A3","A6","A7","A8","A4","A2","A1","A2"),10),
X= sample(1:250, 100, replace=F))
Basically, I want to run two t.tests for values of X for each unique combination of DRUG - ADR. If I take D1-A1 as an example, I want to test the X values for D1-A1 versus D1-A<>1 and the X values for D1-A1 versus D<>1-A1. Below is my syntax for this example, but my question is how to make a general loop / function to perform two tests for each unique combination of DRUG - ADR.
x <- ifelse (dat$DRUG == "D1" & dat$ADR == "A1",dat$X, NA)
x <- x[!is.na(x)]
y <- ifelse (dat$DRUG != "D1" & dat$ADR == "A1",dat$X, NA)
y <- y[!is.na(y)]
z <- ifelse (dat$DRUG == "D1" & dat$ADR != "A1",dat$X, NA)
z <- z[!is.na(z)]
t.test(x,y)
t.test(x,z)
So for record number 4 (D3-A6) the syntax would be:
x <- ifelse (dat$DRUG == "D3" & dat$ADR == "A6",dat$X, NA)
x <- x[!is.na(x)]
y <- ifelse (dat$DRUG != "D3" & dat$ADR == "A6",dat$X, NA)
y <- y[!is.na(y)]
z <- ifelse (dat$DRUG == "D3" & dat$ADR != "A6",dat$X, NA)
z <- z[!is.na(z)]
t.test(x,y)
t.test(x,z)
Anyone got a good idea for a general function?
EDIT: My ideal result would be the following table:
Drug ADR pvalue1 pvalue2
1 D1 A1 pval11 pval21
2 D2 A2 pval12 pval22
3 D.. A.. pval1.. pval2..
Upvotes: 0
Views: 197
Reputation: 546053
As in every programming problem, the solution is in two steps:
The you can proceed to
However, first off: the t-tests sometimes fail due to insufficient data; so let’s replace the t.test
calls:
t_test = function (x, y, ...) {
tryCatch(t.test(x, y, ...)$p.value, error = function (err) NA)
}
Then, all taken together, this gives us:
library(dplyr) # Makes data manipulation easier.
test_combination = function (data, id) {
drug = data[id, ]$DRUG
adr = data[id, ]$ADR
match = filter(data, DRUG == drug, ADR == adr)$X
mismatch1 = filter(data, DRUG != drug, ADR == adr)$X
mismatch2 = filter(data, DRUG == drug, ADR != adr)$X
list(pval1 = t_test(match, mismatch1), pval2 = t_test(match, mismatch2))
}
Which tests a single combination. Now we test all of them:
result = lapply(dat$ID, test_combination, data = dat) %>%
bind_rows() %>%
bind_cols(dat, .) %>%
select(-X)
Or, using a more dplyr-like (but in my opinion somewhat obscure) approach:
result = dat %>%
rowwise() %>%
do(bind_rows(test_combination(dat, .$ID))) %>%
bind_cols(dat, .) %>%
select(-X)
Note how this code doesn’t use explicit for
loops. This is how you process data in R: you apply a function to items in a table or list, rather than iterating manually.
Note that the above is highly questionable, statistically speaking. At the very least you need to perform rigorous multiple testing correction.
Upvotes: 1