Reputation: 59980
In R I am finding some odd behaviour that I can't explain and I am hoping someone here can. I believe that the value of 100! is this big number.
A few lines from the console showing expected behaviour...
>factorial( 10 )
[1] 3628800
>prod( 1:10 )
[1] 3628800
> prod( as.double(1:10) )
[1] 3628800
> cumprod( 1:10 )
[1] 1 2 6 24 120 720 5040 40320 362880 3628800
However when I try 100! I get (notice how the resulting numbers begin to differ about 14 digits in):
> options(scipen=200) #set so the whole number shows in the output
> factorial(100)
[1] 93326215443942248650123855988187884417589065162466533279019703073787172439798159584162769794613566466294295348586598751018383869128892469242002299597101203456
> prod(1:100)
[1] 93326215443944102188325606108575267240944254854960571509166910400407995064242937148632694030450512898042989296944474898258737204311236641477561877016501813248
> prod( as.double(1:100) )
[1] 93326215443944150965646704795953882578400970373184098831012889540582227238570431295066113089288327277825849664006524270554535976289719382852181865895959724032
> all.equal( prod(1:100) , factorial(100) , prod( as.double(1:100) ) )
[1] TRUE
If I do some tests against a variable set to the 'known' number of 100! then I see the following:
# This is (as far as I know) the 'true' value of 100!
> n<- as.double(93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000)
> factorial(100) - n
[1] -1902315522848807765998160811905210717565551993186466795054798772271710903343294674760811531554315419925519536152107160826913610179566298858520576
> prod(1:100) - n
[1] -48777321098687378615337456715518223527321845979140174232174327494146433419058837814379782860367062049372295798771978482741374619988879457910784
> prod(as.double(1:100)) - n
[1] 0
The final result evaluates to zero, but the number returned for prod( as.double( 1:100 ) )
does not display as I would expect, even though it correctly evaluates prod( as.double( 1:100 ) ) - n
where n
is a variable set to the value of 100!.
Can anyone explain this behaviour to me please? It should not be related to overflow etc as far as I am aware, as I am using a x64 system. Version and machine info below:
> .Machine$double.xmax
[1] 1.798e+308
> str( R.Version() )
List of 14
$ platform : chr "x86_64-apple-darwin9.8.0"
$ arch : chr "x86_64"
$ os : chr "darwin9.8.0"
$ system : chr "x86_64, darwin9.8.0"
$ status : chr ""
$ major : chr "2"
$ minor : chr "15.2"
$ year : chr "2012"
$ month : chr "10"
$ day : chr "26"
$ svn rev : chr "61015"
$ language : chr "R"
$ version.string: chr "R version 2.15.2 (2012-10-26)"
$ nickname : chr "Trick or Treat"
Can anyone explain this to me? I don't doubt that R does everything correctly and this is most likely useR related. You might point out that since prod( as.double( 1:100 ) ) - n
evaluates correctly what am I bothered about, but I am doing Project Euler Problem 20 so I needed the correct digits displayed.
Thanks
Upvotes: 23
Views: 6221
Reputation: 8403
Well, you can tell from the body of factorial
that it calls gamma
, which calls .Primitive("gamma")
. What does .Primitive("gamma")
look like? Like this.
For large inputs, .Primitive("gamma")
's behaviour is on line 198 of that code. It's calling
exp((y - 0.5) * log(y) - y + M_LN_SQRT_2PI +
((2*y == (int)2*y)? stirlerr(y) : lgammacor(y)));
which is just an approximation.
By the way, the article on Rmpfr
uses factorial
as its example. So if you're trying to solve the problem, "just use the Rmpfr
library".
Upvotes: 1
Reputation: 10755
I will add a third answer just to graphically describe the behaviour you are encountering. Essentially, the double precision for factorial calculation is sufficient up to 22!, then it starts diverging more and more from the real value.
Around the 50!, there is a further distinction between the two methods factorial(x) and prod(1:x), with the latter yielding, as you indicated, values more similar to the "real" factor.
Code attached:
# Precision of factorial calculation (very important for the Fisher's Exact Test)
library(gmp)
perfectprecision<-list()
singleprecision<-c()
doubleprecision<-c()
for (x in 1:100){
perfectprecision[x][[1]]<-factorialZ(x)
singleprecision<-c(singleprecision,factorial(x))
doubleprecision<-c(doubleprecision,prod(1:x))
}
plot(0,col="white",xlim=c(1,100),ylim=c(0,log10(abs(doubleprecision[100]-singleprecision[100])+1)),
,ylab="Log10 Absolute Difference from Big Integer",xlab="x!")
for(x in 1:100) {
points(x,log10(abs(perfectprecision[x][[1]]-singleprecision[x])+1),pch=16,col="blue")
points(x,log10(abs(perfectprecision[x][[1]]-doubleprecision[x])+1),pch=20,col="red")
}
legend("topleft",col=c("blue","red"),legend=c("factorial(x)","prod(1:x)"),pch=c(16,20))
Upvotes: 7
Reputation: 74415
Your test with all.equal
does not produce what you expect. all.equal
can only compare two values. The third argument is positionally matched to tolerance
, which gives the tolerance of the comparison operation. In your invocation to all.equal
you give it a tolerance of 100!
which definitely leads to the comparison being true for absurdly differing values:
> all.equal( 0, 1000000000, prod(as.double(1:100)) )
[1] TRUE
But even if you give it two arguments only, e.g.
all.equal( prod(1:100), factorial(100) )
it would still produce TRUE
because the default tolerance is .Machine$double.eps ^ 0.5
, e.g. the two operands have to match to about 8 digits which is definitely the case. On the other hand, if you set the tolerance to 0
, then neither three possible combinations emerge equal from the comparison:
> all.equal( prod(1:100), factorial(100), tolerance=0.0 )
[1] "Mean relative difference: 1.986085e-14"
> all.equal( prod(1:100), prod( as.double(1:100) ), tolerance=0.0 )
[1] "Mean relative difference: 5.22654e-16"
> all.equal( prod(as.double(1:100)), factorial(100), tolerance=0.0 )
[1] "Mean relative difference: 2.038351e-14"
Also note that just because you've told R to print 200 significant numbers doesn't mean that they are all correct. Indeed, 1/2^53 has about 53 decimal digits but only the first 16 are considered meaningful.
This also makes your comparison to the "true" value flawed. Observe this. The ending digits in what R gives you for factorial(100)
are:
...01203456
You subtract n
from it, where n
is the "true" value of 100! so it should have 24 zeroes at the end and hence the difference should also end with the same digits that factorial(100)
does. But rather it ends with:
...58520576
This only shows that all those digits are non-significant and one should not really look into their value.
It takes 525 bits of binary precision in order to exactly represent 100! - that's 10x the precision of double
.
Upvotes: 13
Reputation: 336328
This has to do not with the maximum value for a double
but with its precision.
100!
has 158 significant (decimal) digits. IEEE double
s (64 bit) have 52 bits of storage space for the mantissa, so you get rounding errors after about 16 decimal digits of precision have been exceeded.
Incidentally, 100!
is in fact, as you suspected,
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
so all of the values R calculated are incorrect.
Now I don't know R, but it seems that all.equal()
converts all three of those values to float
s before comparing, and so their differences are lost.
Upvotes: 15