user2503950
user2503950

Reputation: 65

GotoBLAS2 with multiple cores

I am trying to use GotoBLAS2 on R 3.0 on Unix. I downloaded GotoBLAS2 source code from TACC web site, compiled it, and replaced libRblas.so with libgoto2.so, following the instructions at the link http://www.rochester.edu/college/gradstudents/jolmsted/files/computing/BLAS.pdf. The simple matrix operations in R like "determinant" are 20 times faster than before (I am using huge matrices), which is good. However, I cannot use many cores in parallel now.

For example, below code runs forever. But if I use commented out "for" instead of "foreach", it takes just a second. When I was using R's default BLAS library, I could run below code (using many cores) (but it took more time since BLAS was not optimized, of course)..

library("foreach")
library("doParallel")

registerDoParallel(cores=2)
set.seed(100)

foreach (i = 1:2) %dopar% {
# for (i in 1:2) {
a = replicate(1000, rnorm(1000))
d = determinant(a)

So, is it possible to use many cores at the same time with GotoBLAS2, do you have any ideas?

Thanks a lot in advance.

Upvotes: 1

Views: 519

Answers (2)

Steve Weston
Steve Weston

Reputation: 19677

You should make sure that the number of doParallel workers times the number of threads used by your BLAS library is no greater than the number of cores if the parallel tasks will execute a multi-threaded operation. But you may be hitting a different problem caused by GotoBLAS2.

The default build of GotoBLAS2 and OpenBLAS set the CPU affinity of the R process in such a way that child processes all run on the same core of the CPU. This causes serious problems for packages such as parallel/doParallel, since all of the workers are forced to use a single core.

You can work-around this problem with the new "mcaffinity" function which was added to the parallel package in R 3.0.0 specifically to address this issue. You can also use it to verify that this is your problem. Here is the output from an R session which was initially restricted to run on a single core:

> library(parallel)
> mcaffinity()
[1] 1
> mcaffinity(1:128)
[1] 1 2 3 4 5 6

After executing this, all six cores can be used. For your example, simply add mcaffinity(1:128) before executing the foreach loop.

But since you built GotoBLAS2 from source, you can also disable this feature by setting NO_AFFINITY to "1" in the Makefile and rebuilding:

# If you want to disable CPU/Memory affinity on Linux.
NO_AFFINITY = 1

Upvotes: 1

Andrey Shabalin
Andrey Shabalin

Reputation: 4614

Most likely, GotoBLAS is already using multiple cores so there is no gain in using %dopar%. I would also expect a slowdown from %dopar% as you are running more threads than the number of CPU cores you have.

Still would not expect the code to 'run forever', just slower than the for one.

Upvotes: 1

Related Questions