Reputation: 115
I have compiled jBLAS with NVBLAS with a somewhat hacky solution, since the configure script was not correctly finding the libraries. I manually edited the configure.out
file of jBLAS like so, to include the NVBLAS libraries.
BUILD_TYPE=nvblas
CC=gcc
CCC=c99
CFLAGS=-fPIC -DHAS_CPUID
F77=gfortran
FOUND_JAVA=true
FOUND_NM=true
INCDIRS=-Iinclude -I/usr/lib/jvm/java-11-openjdk-amd64/include -I/usr/lib/jvm/java-11-openjdk-amd64/include/linux
JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
LAPACK_HOME=./lapack-lite-3.1.1
LD=gcc
LDFLAGS=-shared
LIB=lib
LINKAGE_TYPE=static
LOADLIBES=-Wl,-z,muldefs /home/linyi/jblas/lapack-lite-3.1.1/lapack_LINUX.a /usr/local/cuda-11.0/lib64/libnvblas.so.11 /home/linyi/jblas/lapack-lite-3.1.1/blas_LINUX.a -lgfortran
MAKE=make
NM=nm
OS_ARCH=amd64
OS_ARCH_WITH_FLAVOR=amd64/sse3
OS_NAME=Linux
RUBY=ruby
SO=so
I then ran the commands make clean all
and mvn clean package
as documented here. The tests pass successfully, but the program results in a segmentation fault on exit.
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running org.jblas.TestEigen
[NVBLAS] NVBLAS_CONFIG_FILE environment variable is set to '/home/linyi/nvblas.conf'
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.569 sec
Running org.jblas.TestComplexFloat
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec
Running org.jblas.TestDecompose
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 sec
Running org.jblas.TestBlasDouble
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec
Running org.jblas.TestBlasDoubleComplex
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec
Running org.jblas.TestSingular
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 sec
Running org.jblas.TestDoubleMatrix
Tests run: 37, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.022 sec
Running org.jblas.TestSolve
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec
Running org.jblas.TestBlasFloat
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec
Running org.jblas.TestFloatMatrix
Tests run: 37, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.012 sec
Running org.jblas.SimpleBlasTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
Running org.jblas.ranges.RangeTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 sec
Running org.jblas.TestGeometry
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec
Running org.jblas.ComplexDoubleMatrixTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
-- org.jblas INFO Deleting /tmp/jblas4383455253907276334/libjblas.so
-- org.jblas INFO Deleting /tmp/jblas4383455253907276334/libjblas_arch_flavor.so
-- org.jblas INFO Deleting /tmp/jblas4383455253907276334
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fa1f6bb96b1, pid=8063, tid=8072
#
# JRE version: OpenJDK Runtime Environment (11.0.8+10) (build 11.0.8+10-post-Ubuntu-0ubuntu118.04.1)
# Java VM: OpenJDK 64-Bit Server VM (11.0.8+10-post-Ubuntu-0ubuntu118.04.1, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C [libcublas.so.11+0xa096b1]
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/linyi/jblas/jblas/core.8063)
#
# An error report file with more information is saved as:
# /home/linyi/jblas/jblas/hs_err_pid8063.log
#
# If you would like to submit a bug report, please visit:
# https://bugs.launchpad.net/ubuntu/+source/openjdk-lts
#
Aborted (core dumped)
Results :
Tests run: 122, Failures: 0, Errors: 0, Skipped: 0
I decided to run mvn clean package -DskipTests
as it seemed that the tests were passing fine, just that the program causes a segmentation fault on termination. When I used the libary in my Java project however, nvblas.log
reveals that although NVBLAS was intercepting the calls to the BLAS routines, they were in fact being executed on the CPU instead of the GPU. Running nvprof --print-gpu-summary
with my program also resulted in the same conclusion.
#
==7711== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 100.00% 1.8240us 1 1.8240us 1.8240us 1.8240us [CUDA memcpy HtoD]
======== Error: Application received signal 134
And the contents of nvblas.log
are as follows:
[NVBLAS] Using devices :0
[NVBLAS] Config parsed
[NVBLAS] dgemm[cpu]: ta=N, tb=N, m=1, n=1, k=1
[NVBLAS] dsyr2k[cpu]: up=U, ta=N, n=24, k=28
[NVBLAS] dsyr2k[cpu]: up=U, ta=N, n=32, k=28
[NVBLAS] dsyr2k[cpu]: up=U, ta=N, n=26, k=28
[NVBLAS] dsyr2k[cpu]: up=U, ta=N, n=22, k=28
[NVBLAS] dsyr2k[cpu]: up=U, ta=N, n=20, k=28
[NVBLAS] dsyr2k[cpu]: up=U, ta=N, n=26, k=28
[NVBLAS] dtrmm[cpu]: si=R, up=U, ta=N, di=U, m=52, n=31
[NVBLAS] dtrmm[cpu]: si=R, up=U, ta=N, di=U, m=54, n=31
[NVBLAS] dtrmm[cpu]: si=R, up=L, ta=T, di=N, m=52, n=31
[NVBLAS] dtrmm[cpu]: si=R, up=L, ta=T, di=N, m=54, n=31
[NVBLAS] dtrmm[cpu]: si=R, up=U, ta=N, di=U, m=60, n=31
[NVBLAS] dtrmm[cpu]: si=R, up=U, ta=T, di=U, m=54, n=31
[NVBLAS] dtrmm[cpu]: si=R, up=U, ta=T, di=U, m=52, n=31
[NVBLAS] dtrmm[cpu]: si=R, up=L, ta=T, di=N, m=60, n=31
[NVBLAS] dsyr2k[cpu]: up=U, ta=N, n=22, k=28
[NVBLAS] dtrmm[cpu]: si=R, up=U, ta=T, di=U, m=60, n=31
[NVBLAS] dtrmm[cpu]: si=R, up=U, ta=N, di=U, m=54, n=22
[NVBLAS] dgemm[cpu]: ta=T, tb=N, m=54, n=22, k=31
[NVBLAS] dtrmm[cpu]: si=R, up=U, ta=N, di=U, m=60, n=28
[NVBLAS] dtrmm[cpu]: si=R, up=U, ta=N, di=U, m=52, n=20
[NVBLAS] dtrmm[cpu]: si=R, up=L, ta=T, di=N, m=54, n=22
[NVBLAS] dgemm[cpu]: ta=T, tb=N, m=60, n=28, k=31
[NVBLAS] dgemm[cpu]: ta=T, tb=N, m=52, n=20, k=31
[NVBLAS] dgemm[cpu]: ta=N, tb=T, m=31, n=54, k=22
. . .
I am really at a loss as to what to do, I hope someone can offer any advice, this seems to be really poorly documented.
Upvotes: 3
Views: 218
Reputation: 115
I believe I have figured it out.
nm libjblas.so | grep -is dgemm
U cblas_dgemm
U dgemm_@@libnvblas.so.11
This shows that the libary is indeed linking correctly. I then proceeded to run the built in benchmark of jBlas, by just running java -jar jblas.jar
where jblas.jar
is the compiled library, and apparently GPU offload only occurs for large matrices, as no gpu calls were made when n=10
or n=100
but nvblas.log
logged GPU computation at n=1000
. This was very confusing for me, I hope this helps anyone else struggling with this issue.
Upvotes: 0