Reputation: 511
My results of using splines::ns
with a least-squares fit varied with no rhyme or reason that I could see, and I think I have traced the problem to the ns
function itself.
I have reduced the problem to this:
require(splines)
N <- 0
set.seed(1)
for (i in 1:100) N <- N + identical(ns(1:10,3),ns(1:10,3))
N
My results average about 39, range 34--44 or so, but I expected 100 every time. Why should the results of ns
be random? If I substitute bs
for ns
in both places, I get 100, as expected. My set.seed(1)
hopes to demonstrate that the randomness I get is not what R intended.
In a clean session, using RStudio and R version 2.14.2 (2012-02-29), I get 39, 44, 38, etc. Everyone else seems to be getting 100.
Further info:
Substituing splines::ns
for ns
gives the same results. A clean vanilla session gives the same results. My computer has 8 cores.
The differences, when they happen, are generally or always 2^-54:
Max <- 0
for (i in 1:1000) Max <- max( Max, abs(ns(1:10,3)-ns(1:10,3)) )
c(Max,2^-54)
with result [1] 5.551115e-17 5.551115e-17
. This variability causes me big problems down the line, because my optimize(...)$min
now varies sometimes even in the first digit, making results not repeatable.
My sessionInfo with a clean vanilla session:
I created what I understand to be known as a clean vanilla session using
> .Last <- function() system("R --vanilla")
> q("no")
This blows away the session, and when I restart it, I get my clean vanilla session. Then, in answer to Ben Bolker's clarifying question, I did this at the beginning of my clean vanilla session:
> sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Revobase_6.1.0 RevoMods_6.1.0 RevoScaleR_3.1-0 lattice_0.20-0
[5] rpart_3.1-51
loaded via a namespace (and not attached):
[1] codetools_0.2-8 foreach_1.4.0 grid_2.14.2 iterators_1.0.6
[5] pkgXMLBuilder_1.0 revoIpe_1.0 tools_2.14.2 XML_3.9-1.1
> require(splines)
Loading required package: splines
> N <- 0
> set.seed(1)
> for (i in 1:100) N <- N + identical(ns(1:10,3),ns(1:10,3))
> N
[1] 32
Upvotes: 3
Views: 321
Reputation: 511
This is the answer I got from REvolution Technical Suppoort (posted here with permission):
The problem here is an issue of floating point arithmetic. Revolution R uses the Intel mkl BLAS library for some computations, which differs from what CRAN-R uses and uses this library for the 'ns()' computation. In this case you will also get different results depending on whether you are doing the computation on a Intel-processor based machine or a machine with an AMD chipset.
We do ship the same BLAS and Lapack DLL's that are shipped with CRAN-R, but they are not the default ones used with Revolution R. Customers can revert the installed DLL's if they so choose and prefer, by doing the following:
1). Renaming 'Rblas.dll' to 'Rblas.dll.bak' and 'Rlapack.dll' to 'Rlapack.dll.bak' in the folder 'C:\Revolution\R-Enterprise-6.1\R-2.14.2\bin\x64'.
2). Rename the files 'Rblas.dll.0' and 'Rlapack.dll.0' in this folder to Rblas.dll and Rlpack.dll respectively.
Their suggestion worked perfectly. I have renamed these files back and forth several times, using both RStudio (with Revolution R) and Revolution R's own IDE, always with the same result: The BLAS dlls give me N==40 or so
, and the CRAN-R dlls give me N==100
.
I will probably go back to BLAS, because in my tests, it is 8-times faster for %*%
and 4-times faster for svd()
. And that is just using one of my cores (verified by the CPU usage column of the Processes tab of Windows Task Manager).
I am hoping someone with better understanding can write a better answer, for I still don't really understand the full ramifications of this.
Upvotes: 1