Ricardo Saporta
Ricardo Saporta

Reputation: 55390

big64 - sum() on a vector of NA produces odd results

When using big64 package, summing a vector of NAs to another vector of integers yields an inaccurate result. Depending on whether the NA vector is summed first or last, the results will be either 0 or twice the correct answer, respectively.

Notice that converting the NA vector away from integer64 will remove the issue.

However, when experimenting with other small values in place of y, the results were awfully strange. For example:

40 + 35 = 75    but
35 + 40 = 80

Any thoughts as to what is going on?

EXAMPLE:

  library(bit64)

  x <- as.integer64(c(20, 20))
  y <- as.integer64(c(NA, NA))

  sum(y, x, na.rm=TRUE)
  # integer64
  # [1] 80   # <~~~ Twice the correct value

  sum(x, y, na.rm=TRUE)
  # integer64
  # [1] 0   # <~~~~ Incorrect 0.  Should be 40. 

  ## Removing the NAs does not help. 
  y <- y[!is.na(y)]

  ## A vector of 0's gives the same issue
  y <- as.integer64(c(0, 0))

  ## Same results
  sum(y, x, na.rm=TRUE)
  # integer64
  # [1] 80

  sum(x, y, na.rm=TRUE)
  # integer64
  # [1] 0

  ## Converting to numeric does away with the issue (but is not a viable workaround, for obvious reasons)
  y <- as.numeric(y)

  sum(y, x, na.rm=TRUE)
  # [1] 1.97626e-322

  sum.integer64(y, x, na.rm=TRUE)
  # integer64
  # [1] 40

  sum(x, y, na.rm=TRUE)
  # integer64
  # [1] 40

Give y a single value, and the results are also very out of place

  y <- as.integer64(c(35, NA, NA))
  sum.integer64(x, if (!all(is.na(y))) removeNA(y), na.rm=TRUE)
  sum.integer64(x, y[[1]], na.rm=TRUE)
  sum.integer64(y[[1]], x, na.rm=TRUE)

  ## No NA's present
  sum.integer64(as.integer64(35), x)
  # integer64
  # [1] 80
  sum.integer64(x, as.integer64(35))
  # integer64
  # [1] 70

Upvotes: 3

Views: 123

Answers (1)

user3710546
user3710546

Reputation:

Not an answer, but an exploration. Hope it might help you.

From the sum.integer64 function of the bit64 package:

function (..., na.rm = FALSE) 
{
    l <- list(...)
    ret <- double(1)
    if (length(l) == 1) {
        .Call("sum_integer64", l[[1]], na.rm, ret)
        oldClass(ret) <- "integer64"
        ret
    }
    else {
        ret <- sapply(l, function(e) {
            if (is.integer64(e)) {
                .Call("sum_integer64", e, na.rm, ret)
                ret
            }
            else {
                as.integer64(sum(e, na.rm = na.rm))
            }
        })
        oldClass(ret) <- "integer64"
        sum(ret, na.rm = na.rm)
    }
}

Here is your example:

library(bit64)
x <- as.integer64(c(20, 20))
y <- as.integer64(c(NA, NA))

na.rm <- TRUE
l <- list(y, x)
ret <- double(1)
ret
#[1] 0

# We use the sapply function as in the function:
ret <- sapply(l, function(e) { .Call("sum_integer64", e, na.rm, ret) })
oldClass(ret) <- "integer64"
ret
#integer64
#[1] 40 40      <-- twice the value "40"
sum(ret, na.rm = na.rm)
# integer64
#[1] 80         <-- twice the expected value, as you said

Here we decompose the calculation, for each vector:

ret <- double(1)
ret2 <- NULL
ret2[1] <- .Call("sum_integer64", y, na.rm, ret)
ret2[2] <- .Call("sum_integer64", x, na.rm, ret)
oldClass(ret2) <- "integer64"
ret2
#integer64
#[1] 0  40      <-- only once the value "40", and "0" because of NaNs
sum(ret2, na.rm = na.rm)
#integer64
#[1] 40         <- expected value

Upvotes: 2

Related Questions