Reputation: 121
I've got a huge data set with six columns (call them A, B, C, D, E, F), about 450,000 rows. I simply tried to find the correlation between columns A
and B
:
cor(A, B)
and I got
[1] NA
as a result. What can I do to fix this problem?
Upvotes: 6
Views: 2504
Reputation: 1638
You might consider using the rcorr function in the Hmisc package.
It is very fast, and only includes pairwise complete observations. The returned object contains a matrix
Some example code is available here:
Upvotes: 4
Reputation: 20570
Try cor(A,B, use = "pairwise.complete.obs")
. That will ignore the NAs in your observations.
To be statistically rigorous, you should also look at the # of missing entries in your data and look at whether the missing at random assumption holds.
Edit 1: Take a look at ?cor
to see other options for the use
parameter.
Upvotes: 13