Reputation: 430
I'm trying to find a good way to test whether or not the correlation between two vectors is perfect (or NA.) I've tried lots of different methods, but I'm having a similar problem with all of them, namely that the result of correlation doesn't evaluate in the way I'd expect.
This is my latest example:
foo1 <- c(4, NA, 6, NA)
foo2 <- c(1, 2, 3, 4)
set <- c(-1, 1, NA)
correlation <- cor(foo1, foo2, use = "na.or.complete") # Result: 1
correlation %in% set # Should be TRUE, is FALSE
correlation == 1 # Should also be TRUE, but is FALSE
is.numeric(correlation) is TRUE. The only value I can see in it or around it is 1. Then whyyy in the world does this not work?
set <- c('-1', '1', NA)
This works, but I'm not sure why, and I'm worried there are ways that it might fail because I clearly don't understand what's going on with the returned value.
Any insight at all would be helpful!
Upvotes: 2
Views: 221
Reputation: 430
You people were super helpful. For the interested, here's what the final result is, although I clearly have to play with my tolerances a little bit more.
corrIsOkay <- function(x, y){
correlation <- cor(x, y, use = "na.or.complete")
if ((1 - abs(correlation)) <= .01 | is.na(correlation)){
print(correlation)
return(FALSE)
}
return(TRUE)
}
It's a lot wordier than I would have liked, since my original "solution" fit inside an if statement by itself, but now I just call this function in the if statement.
makeMvMissing <- function(data) {
repeat {
x <- makeMissing(data)[, 1]
y <- makeMissing(data, variable.missing = "y")[, 2]
if (corrIsOkay(x, y)){
break
}
}
return(data.frame(x, y))
}
Upvotes: 1
Reputation: 226182
I think this is basically FAQ 7.31, but "how do I see if a value is within a set (within tolerance)" requires a slight extension of the standard "just use all.equal()
" answer ...
test <- function(x) {
isTRUE(all.equal(x,-1)) ||
isTRUE(all.equal(x,1)) || is.na(x)
}
test(1-1e-14) ## TRUE
test(NA) ## TRUE
test(0.88) ## FALSE
is the most precise solution to your question, although I can see that it would get difficult if you had a much longer list of candidates ... since isTRUE(all.equal(1,NA))
is FALSE
(as is convenient) perhaps
test <- function(x,candidates=c(-1,1,NA), ...) {
any(sapply(lapply(candidates,all.equal,target=x,...),
isTRUE))
}
test(1-1e-6) ## FALSE
test(1-1e-6,tolerance=1e-4) ## TRUE
The one wrinkle here is that isTRUE(all.equal(NA,NaN))
is not TRUE
, so one might want to build is.na()
(which tests for either NA
or NaN
) in here somewhere, or include NaN
in the list of candidates.
Upvotes: 2
Reputation: 66819
Instead, you can make sure that they are close enough to your set:
mytol <- 1e-10
set <- set[1:2]
any(abs(correlation - set) <= mytol)|is.na(correlation) # TRUE
It's to do with floats and tolerance, I guess. I'm not sure what the standard reference is, but here's one of them: Numeric comparison difficulty in R
1 is not really 1; if you wanted an integer (and here I guess you shouldn't), you could use 1L
. Vectors created like 1:3
and seq(1,3)
are also integers. Look at ?integer
and ?numeric
for more information. Strangely, I can't find a doc page that covers the differences between the classes.
EDIT: I split off the check for NA because, as the OP pointed out, it didn't work.
Upvotes: 2