R - In sqrt(1 - h * h) : NaNs produced from within rcorr - full sample data available

Question

I have some code that creates a matrix of survey question responses, Rows - answers to the questions from a unique survey instrument, columns the individual questions. A final column has been appended with the row means. This is then passed to rcorr for evaluation. I have 15 sets of data, but only within this particular set is it throwing NaNs, and I can't see what the difference is.

m.rcorr <- rcorr(matrix, type="pearson")

A CSV of the matrix being passed is linked here. There are legitimate values of NA in some columns, as not every respondent answers each question. Other responses are 0, 25,50,75, or 100.

I get two warnings of: In sqrt(1 - h * h) : NaNs produced on this data set.

I have 14 other sets that run without NaNs being produced that have varying occurrences of NA, and I even took a look at whether 0 was the problem, but other sets again handle those fine.

Next I stepped into rcorr, assigning my matrix to x:

type <- "pearson"
{
  type <- match.arg(type)
  if (!missing(y)) 
    x <- cbind(x, y)
  x[is.na(x)] <- 1e+50
  storage.mode(x) <- "double"
  p <- as.integer(ncol(x))
  if (p < 1) 
    stop("must have >1 column")
  n <- as.integer(nrow(x))
  if (n < 5) 
    stop("must have >4 observations")
  h <- .Fortran(F_rcorr, x, n, p, itype = as.integer(1 + (type == "spearman")), hmatrix = double(p * p), npair = integer(p * p), double(n), double(n), double(n), double(n), double(n), integer(n))

The assignment of h is where I get stuck

Error: object 'F_rcorr' not found

The package Hmisc is installed and loaded, as again, this code works 14 out of 15 times.

F_rcorr is an internal Hmisc function, according to the help, not to be called by the user or undocumented, so I'm not quite sure where to go next.

I'm looking to answer two questions:

Why is this particular set throwing the NaNs
What impact does that have on the final values of the list $r from the rcorr results.

Addendum: Using the Hmisc::: prefix as suggested in the comment, I was able to get further and found two pairs in my data that when the value of h was 1, instead of 1 - h * h evaluating to 0, it was evaluating to the two very small negative numbers. It was only in these two pairs, and didn't happen on the diagonal, or in other places where that pair was valued at 1, so I'm not sure why those two generated weirdness, since 1 - 1 * 1 should equal 0 all day long.

However, to get around that I copied the rcorr function into a new function, adding these two lines before the P assignment, and then took the sqrt of the new D that substituted the negative numbers with 0.

 D <- 1 - h * h
 D[D<0] <- 0
 P <- matrix(2 * (1 - pt(q = abs(h) * sqrt(npair - 2)/sqrt(D), df = npair - 2)), ncol = p)

I still like to know what may be going on that created the result of tiny negative number instead of 0 in that calculation, but I believe I've found a non-harmful way of getting around it.

R - In sqrt(1 - h * h) : NaNs produced from within rcorr - full sample data available

Answers (1)

Related Questions