Reputation: 1480
I read that clang compiler can offload OpenMP regions to GPUs. However, I am confused on the way to compile the code with clang. The clang version that is installed in our cluster is 3.9.0 (tags/RELEASE_390/final 288133). The code I want to offload is basically a matrix-matrix multiplication:
#pragma omp target parallel for shared(C,P,T) private(i,j,k)
for (i=0; i<N; i++) {
for (j=0; j<N; j++) {
for (k=0; k<N; k++) {
C[i][j] += P[i][k]*T[k][j];
}
}
}
I am compiling with
clang -O3 -fopenmp-targets=x86_64-unknown-linux-gnu mm.c
clang-3.9: warning: argument unused during compilation: '-fopenmp-targets=x86_64-unknown-linux-gnu'
What I don't know is if my installed version of clang is being able to offload code to GPUs and if so, how could I do it. Any comment is welcome.
Upvotes: 4
Views: 2492
Reputation: 1239
Short answer: According to the Clang 3.9.0 Release Notes it is not:
Currently Clang supports OpenMP 3.1 and all non-offloading features of OpenMP 4.0/4.5. Offloading features are under development.
I couldn't find anything in the newer release notes on whether or not it has been officially added in the meantime to a newer version. There seems to be some support though. The libomptarget doc states two plugins
generic 64 bit for the following targets:
powerpc64-ibm-linux-gnu
powerpc64le-ibm-linux-gnu
x86_64-pc-linux-gnu
cuda, for the target (see this):
nvptx-nvidia-cuda
nvptx64-nvidia-cuda
Maybe this document helps you if you want to try building your own Clang.
A list of OpenMP implementations can be found here. GCC 6.1 and newer fully supports OpenMP 4.5. Offloading seems to be support by GCC 5 and later, but for Nvidia GPUs only via OpenACC, see here.
In general, the support of the target
construct, i.e. offloading, is not equal with GPU (or any specific device) support, as this requires device-specific implementations per device. The trivial support would be offloading to to the host device, e.g. x86-CPU, which wouldn't be beneficial for anything but testing or compatibility in the absence of accelerator devices.
Hope that helps.
Upvotes: 2
Reputation: 763
Since version 7, clang supports OpenMP offloading to Nvidia GPUs (http://releases.llvm.org/7.0.0/tools/clang/docs/ReleaseNotes.html#id12):
Clang gained basic support for OpenMP 4.5 offloading for NVPTX target.
To compile your program for NVPTX target use the following options:
-fopenmp -fopenmp-targets=nvptx64-nvidia-cuda
for 64 bit platforms or-fopenmp -fopenmp-targets=nvptx-nvidia-cuda
for 32 bit platform.
Upvotes: 2