Reputation: 1
I tried to run the official code in OneAPI example and found that the following code is not actually running on the GPU.
#pragma omp target data map(to:a[0:sizea],b[0:sizeb]) map(tofrom:c[0:sizec]) device(dnum)
{
// run gemm on gpu, use standard oneMKL interface within a variant dispatch construc
#pragma omp target variant dispatch device(dnum) use_device_ptr(a, b, c)
{
cblas_zgemm(CblasColMajor, CblasNoTrans, CblasNoTrans, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc);
}
}
because by export LIBOMPTARGET_PLUGIN_PROFILE=T
I found that the program runs without kernel time,like this:
and by export MKL_VERBOSE=1
I found that the MKL function runs on the GPU for 0 times.such as this:
I would like to know what the problem is and is there any solution,My Linux platform uses Intel's GPU Intel(R) Graphics.thanks
Upvotes: 0
Views: 252
Reputation: 376
Intel oneMKL does support running on CPU and GPU as stated in the documentation here:
but the cblas calls are C calls (built actually on top of a Fortran implemention) that run only on the CPU.
You should be able to make a oneMKL call within OpenMP without a problem, but as the other answer suggests this will just run the call in parallel without affecting the device the code is targeted for.
Upvotes: 0
Reputation: 50508
cblas_zgemm
is a BLAS function call and OpenMP is not meant to rewrite it so to use its own GPU-based implementation. After all, this is just a function-call from the OpenMP point-of-view. The thing is if the linked BLAS implementation is not designed to run on a GPU, then OpenMP will not automatically convert the (compiled) code to a GPU (there is no such tool to far because GPU works very differently from CPUs). As a result, OpenMP cannot run this on the GPU if the BLAS is not meant to use the GPU.
The OneAPI documentation mentions GPU offloading using OpenMP and BLAS, but in separate/independent points. It is not clear whether OneMKL has a GPU-based version. AFAIK, it is not available in an OpenMP program, but possibly from a SysCL/DPC++ code but I am not sure this supports iGPUs so far.
Finally, even though you could do that, it will not be efficient on your target hardware. Intel iGPUs like mainstream PC GPUs (ie. client-side) are not designed for the fast computation double-precision operations: only single-precision one. This is because they are design for 3D rendering and 2D acceleration where single-precision is enough and also because single-precision units consume far less power than double-precision (for a same number of items computed per second). This means a cblas_zgemm
call will certainly significantly faster on your CPU than on your iGPU (assuming it is possible).
Upvotes: 1