Reputation: 38652
Consider the following data:
d = data.frame(
experiment = as.factor(c("foo", "foo", "foo", "bar", "bar")),
si = runif(5),
ti = runif(5)
)
I would like to perform a correlation test for si
and ti
, for each experiment
factor level. So I thought I'd run:
ddply(d, .(experiment), cor.test)
But how do I pass the values of si
and ti
to the cor.test
call? I tried this:
> ddply(d, .(experiment), cor.test, x = si, y = ti)
Error in .fun(piece, ...) : object 'si' not found
> ddply(d, .(experiment), cor.test, si, ti)
Error in match.arg(alternative) :
'arg' must be NULL or a character vector
Is there anything obvious I'm missing? The plyr
documentation does not include any example for me. Most commands I see only involve summarize
as the function call, but doing the usual things I was used to doing from summarize
don't work, as can be seen above.
Upvotes: 3
Views: 7046
Reputation: 52637
ddply splits your data frame by the variables you select (experiment
here) and then passes the function the resulting subsets of the data frame. In your case your function cor.test
doesn't accept a data frame as an input, so you need a translation layer:
d <- data.frame(
experiment = as.factor(c("foo", "foo", "foo", "bar", "bar", "bar")),
si = runif(6),
ti = runif(6)
)
ddply(d, .(experiment), function(d.sub) cor.test(d.sub$si, d.sub$ti)$statistic)
# experiment t
# 1 bar 0.1517205
# 2 foo 0.3387682
Also, your output has to be something like a vector or a data frame, which is why I just chose $statistic
above, but you could have added multiple variables if you wanted.
Side note, I had to add a value to the input data frame as it cor.test won't run on 2 values (was the case for "bar"). If you want more comprehensive stats, you can try:
ddply(d, .(experiment), function(d.sub) {
as.data.frame(cor.test(d.sub$si, d.sub$ti)[c("statistic", "parameter", "p.value", "estimate")])
} )
# experiment statistic parameter p.value estimate
# 1 bar 0.1517205 1 0.9041428 0.1500039
# 2 foo 0.3387682 1 0.7920584 0.3208567
Note that since we're now returning something more complex than just a vector, we need to coerce it to a data.frame. If you want to include more complex values (e.g. the confidence interval, which is a two value result), you would have to simplify them first.
Upvotes: 7
Reputation: 132706
You can use summarize
for this if you don't mind running cor.test
several times for each experiment (i.e., performance isn't an issue).
#note that you need at least 3 value pairs for cor.test
set.seed(42)
d = data.frame(
experiment = as.factor(c("foo", "foo", "foo", "bar", "bar", "bar")),
si = runif(6),
ti = runif(6)
)
library(plyr)
ddply(d, .(experiment), summarize,
r=cor.test(si, ti)$estimate,
p=cor.test(si, ti)$p.value
)
# experiment r p
#1 bar 0.07401492 0.9528375
#2 foo -0.41842834 0.7251622
Upvotes: 1