Reputation: 3
I thought I had a very clear understanding of this until two days ago, but now I might be over thinking it and confusing myself. I'll explain what I'm doing and then ask a couple of probably simplistic questions, but I've searched and found conflicting answers thus far. Surely someone can set me straight.
I have written a fortran code that utilizes a LAPACK routine to solve an eigenvalue problem. My problem set up is (A-LB)x=0, where L is my eigenvalue, x is my eigenvector(s), and A and B are square, complex, non-symmetric, non-hermitian, non-triangular matrices. A and B are both NxN, N in my code will typically be between 1000 and 3000.
Right now the code works perfectly. I'm using an optimized atlas install with LAPACK. I'm specifically running routine ZGGEV (link) because, for now, I need ALL eigenvalue solutions and ALL associated eigenvector solutions.
Now I'm trying to optimize my code to run faster. All of the computers in our lab contain 4 or 8 core CPUs and run on Ubuntu. Is there anything I can do to utilize my full cpu when solving this problem? I've been looking into it the following things:
Finally, I have a few specific Blas questions:
Hopefully someone can clear up some of my Blas questions and point me toward a faster solution method. Thanks!
Upvotes: 0
Views: 639
Reputation: 3612
You are correct expecting multi-threaded behavior mainly from BLAS and not LAPACK routines. The size of the matrices is big enough to utilize multi-threaded environment. I am not sure about the extend of BLAS usage in ZGGEV routine, but it should be more than a spike.
Regarding your specific questions.
libopenblas_*.a
is a copy or soft link of the
libopenblas.a
. The thread number is defined again at compile time.Please check the log files and std.out from the library builds and verify that they have identified the correct number of CPUs.
I noticed that you mentioned, more than one machines. Note that ATLAS is an automatically tuned library. So you have to recompile the library in each machine. On the other hand Openblas accepts DYNAMIC_ARCH=1
option in make
. This library dynamically specify the optimize routines in each machine.
My suggestion for your multi-threaded test is to build Openblas using
$ make DYNAMIC_ARCH=1 NUM_THREADS=8
Then CALL ZGEMM
in your program. This is routine is definitely optimize and should show multi-threaded behavior.
Upvotes: 1