Reputation: 153
Part of the solution to my problem I found here: How to calculate correlation In R
set.seed(123)
X <- data.frame(ID = rep(1:2, each=5), a = sample(1:10), b = sample(1:10))
ddply(X, .(ID), summarize, cor_a_b = cor(a,b))
In addition to cor
(which calculates Pearsons r) I calculate cor.test
(for the p-value). But this fails in case of "not enough finite observations", so when some IDs are solo, which they are quite often in my case.
So I need to calculate r only if there are more than 30 or so pairs of data, if there are less I want NA.
Second problem is that the verbose output of cor.test
inflates the resulting data frame - even if the only thing I wanted is the p-value. That is, if p actually is, what I understand it to be. Is it the significance of r?
I only know the t-test, to calculate the significance of r.
{Formula of the t-test-value: t = (r·(n-2)^0.5)/(1-r^2)^0.5)
- but t is not the significance yet, otherwise I would try to implement the formula into the ddply
statement}
Upvotes: 1
Views: 1033
Reputation: 66862
try this:
> d <- data.frame(id = rep(1:3, c(5, 1, 10)), a = rnorm(16), b = rnorm(16))
> ddply(d, .(id), summarize, cor_a_b = if(length(id) < 3) {NA} else {cor.test(a, b)$p.value})
id cor_a_b
1 1 0.4393595
2 2 NA
3 3 0.5602855
Upvotes: 4