Danielle McCool
Danielle McCool

Reputation: 430

Result from Cor() in R doesn't logically evaluate as I'd expect

I'm trying to find a good way to test whether or not the correlation between two vectors is perfect (or NA.) I've tried lots of different methods, but I'm having a similar problem with all of them, namely that the result of correlation doesn't evaluate in the way I'd expect.

This is my latest example:

foo1 <- c(4, NA, 6, NA)   
foo2 <- c(1, 2, 3, 4)
set  <- c(-1, 1, NA)
correlation <- cor(foo1, foo2, use = "na.or.complete")  # Result: 1
correlation %in% set  # Should be TRUE, is FALSE
correlation == 1      # Should also be TRUE, but is FALSE

is.numeric(correlation) is TRUE. The only value I can see in it or around it is 1. Then whyyy in the world does this not work?

set  <- c('-1', '1', NA)

This works, but I'm not sure why, and I'm worried there are ways that it might fail because I clearly don't understand what's going on with the returned value.

Any insight at all would be helpful!

Upvotes: 2

Views: 221

Answers (3)

Danielle McCool
Danielle McCool

Reputation: 430

You people were super helpful. For the interested, here's what the final result is, although I clearly have to play with my tolerances a little bit more.

corrIsOkay <- function(x, y){
  correlation  <- cor(x, y, use = "na.or.complete")
  if ((1 - abs(correlation)) <= .01 | is.na(correlation)){
    print(correlation)
    return(FALSE)
  }
  return(TRUE) 
}

It's a lot wordier than I would have liked, since my original "solution" fit inside an if statement by itself, but now I just call this function in the if statement.

makeMvMissing <- function(data) {
  repeat {
      x <- makeMissing(data)[, 1]
      y <- makeMissing(data, variable.missing = "y")[, 2]
      if (corrIsOkay(x, y)){
        break
      }
    }
  return(data.frame(x, y))
}

Upvotes: 1

Ben Bolker
Ben Bolker

Reputation: 226182

I think this is basically FAQ 7.31, but "how do I see if a value is within a set (within tolerance)" requires a slight extension of the standard "just use all.equal()" answer ...

test <- function(x) {
          isTRUE(all.equal(x,-1)) || 
            isTRUE(all.equal(x,1)) || is.na(x)
}
test(1-1e-14)  ## TRUE
test(NA)       ## TRUE
test(0.88)      ## FALSE

is the most precise solution to your question, although I can see that it would get difficult if you had a much longer list of candidates ... since isTRUE(all.equal(1,NA)) is FALSE (as is convenient) perhaps

test <- function(x,candidates=c(-1,1,NA), ...) {
    any(sapply(lapply(candidates,all.equal,target=x,...),
               isTRUE))
}
test(1-1e-6)    ## FALSE
test(1-1e-6,tolerance=1e-4)  ## TRUE

The one wrinkle here is that isTRUE(all.equal(NA,NaN)) is not TRUE, so one might want to build is.na() (which tests for either NA or NaN) in here somewhere, or include NaN in the list of candidates.

Upvotes: 2

Frank
Frank

Reputation: 66819

Instead, you can make sure that they are close enough to your set:

mytol <- 1e-10
set <- set[1:2]
any(abs(correlation - set) <= mytol)|is.na(correlation) # TRUE

It's to do with floats and tolerance, I guess. I'm not sure what the standard reference is, but here's one of them: Numeric comparison difficulty in R

1 is not really 1; if you wanted an integer (and here I guess you shouldn't), you could use 1L. Vectors created like 1:3 and seq(1,3) are also integers. Look at ?integer and ?numeric for more information. Strangely, I can't find a doc page that covers the differences between the classes.

EDIT: I split off the check for NA because, as the OP pointed out, it didn't work.

Upvotes: 2

Related Questions