Reputation: 1594

R computed percentages do not sum up to one - precision issue

I am trying to deal with percentages in R and I am getting strange issue. When I convert values of vector to percentages of the sum of the vector, it often happens, that they do not add up to one.

Minimal example:

data <- rnorm(1000)*100
max <- 50
unlist(lapply(0:(1000/max-1), 
     function(i) 
        sum(
            data[(i*max+1):(i*(max+1))]
            /
            sum(data[(i*max+1):(i*(max+1))])
           )
        ))-1

It should give vector of zeros, however I am getting this:

[1]  0.000000e+00  0.000000e+00 -1.110223e-16 -1.110223e-16  0.000000e+00 -1.110223e-16  0.000000e+00  0.000000e+00  0.000000e+00
[10]  0.000000e+00  0.000000e+00  2.220446e-16  0.000000e+00 -4.440892e-16  0.000000e+00  0.000000e+00  0.000000e+00  4.440892e-16
[19] -1.110223e-16  0.000000e+00

Any idea for remedy?

Upvotes: 1

Answers (2)

IRTFM

Reputation: 263362

They are off by a number that is insignificant. If you want to change how these insignificant differences, that are inherent in floating point arithmetic, are displayed you can use the format function or one of its cousins like sprintf or formatC. This is really an instance of FAQ 7.31. If you do want help with formatting, you should describe a particular application. If you wnat to coerce to see zeroes you can also use round()

round( unlist(lapply(0:(1000/max-1), 
 function(i) 
    sum(
        data[(i*max+1):(i*(max+1))]
        /
        sum(data[(i*max+1):(i*(max+1))])
       )
    ))-1  , digits=4)

Upvotes: 4

Gavin Simpson

Reputation: 174803

A more important question is why do you think these should be 0?

You are using floating point arithmetic and not all numbers can be represented exactly in your computer. This is covered (or related to) R FAQ 7.31, which explains the phenomenon.

You can either ignore it (for all intents & purposes, these values are 0)

> all.equal(tmp, rep(0, length(tmp))) ## tmp contain your numbers
[1] TRUE

or learn to deal with it accordingly for your particular operation. One way is to just round them to some extent:

> round(tmp, 2)
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> round(tmp, 3)
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> round(tmp, 4)
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> round(tmp, 5)
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

But it does depend what you want to do with these numbers.

Upvotes: 4

R computed percentages do not sum up to one - precision issue

Answers (2)

Related Questions