Reputation: 21
I have some code that creates a matrix of survey question responses, Rows - answers to the questions from a unique survey instrument, columns the individual questions. A final column has been appended with the row means. This is then passed to rcorr for evaluation. I have 15 sets of data, but only within this particular set is it throwing NaNs, and I can't see what the difference is.
m.rcorr <- rcorr(matrix, type="pearson")
A CSV of the matrix being passed is linked here. There are legitimate values of NA in some columns, as not every respondent answers each question. Other responses are 0, 25,50,75, or 100.
I get two warnings of: In sqrt(1 - h * h) : NaNs produced
on this data set.
I have 14 other sets that run without NaNs being produced that have varying occurrences of NA, and I even took a look at whether 0 was the problem, but other sets again handle those fine.
Next I stepped into rcorr, assigning my matrix to x:
type <- "pearson"
{
type <- match.arg(type)
if (!missing(y))
x <- cbind(x, y)
x[is.na(x)] <- 1e+50
storage.mode(x) <- "double"
p <- as.integer(ncol(x))
if (p < 1)
stop("must have >1 column")
n <- as.integer(nrow(x))
if (n < 5)
stop("must have >4 observations")
h <- .Fortran(F_rcorr, x, n, p, itype = as.integer(1 + (type == "spearman")), hmatrix = double(p * p), npair = integer(p * p), double(n), double(n), double(n), double(n), double(n), integer(n))
The assignment of h is where I get stuck
Error: object 'F_rcorr' not found
The package Hmisc is installed and loaded, as again, this code works 14 out of 15 times.
F_rcorr is an internal Hmisc function, according to the help, not to be called by the user or undocumented, so I'm not quite sure where to go next.
I'm looking to answer two questions:
Addendum: Using the Hmisc::: prefix as suggested in the comment, I was able to get further and found two pairs in my data that when the value of h was 1, instead of 1 - h * h evaluating to 0, it was evaluating to the two very small negative numbers. It was only in these two pairs, and didn't happen on the diagonal, or in other places where that pair was valued at 1, so I'm not sure why those two generated weirdness, since 1 - 1 * 1 should equal 0 all day long.
However, to get around that I copied the rcorr function into a new function, adding these two lines before the P assignment, and then took the sqrt of the new D that substituted the negative numbers with 0.
D <- 1 - h * h
D[D<0] <- 0
P <- matrix(2 * (1 - pt(q = abs(h) * sqrt(npair - 2)/sqrt(D), df = npair - 2)), ncol = p)
I still like to know what may be going on that created the result of tiny negative number instead of 0 in that calculation, but I believe I've found a non-harmful way of getting around it.
Upvotes: 1
Views: 1581
Reputation: 21
So I figured what the heck, and emailed Dr. Harrell, and he replied back that in the next publication of Hmisc he's going to replace sqtr(1 - h * h)
with max(0, 1-h^2)
, which would resolve it (more cleanly) as I did, in substituting 0 for the tiny negative numbers.
I'll admit I fan-girled a bit with him answering my email.
Upvotes: 1