Reputation: 1

oneMKL can not offload by openmp

I tried to run the official code in OneAPI example and found that the following code is not actually running on the GPU.

#pragma omp target data map(to:a[0:sizea],b[0:sizeb]) map(tofrom:c[0:sizec]) device(dnum)
{
    // run gemm on gpu, use standard oneMKL interface within a variant dispatch construc
    #pragma omp target variant dispatch device(dnum) use_device_ptr(a, b, c)
    {
        cblas_zgemm(CblasColMajor, CblasNoTrans, CblasNoTrans, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc);
    }
}

because by export LIBOMPTARGET_PLUGIN_PROFILE=T I found that the program runs without kernel time，like this： enter image description here

and by export MKL_VERBOSE=1 I found that the MKL function runs on the GPU for 0 times.such as this： enter image description here

I would like to know what the problem is and is there any solution，My Linux platform uses Intel's GPU Intel(R) Graphics.thanks

Upvotes: 0

Answers (2)

TonyM

Reputation: 376

Intel oneMKL does support running on CPU and GPU as stated in the documentation here:

https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-mkl-for-dpcpp/top.html

but the cblas calls are C calls (built actually on top of a Fortran implemention) that run only on the CPU.

You should be able to make a oneMKL call within OpenMP without a problem, but as the other answer suggests this will just run the call in parallel without affecting the device the code is targeted for.

Upvotes: 0

Jérôme Richard

Reputation: 50508

cblas_zgemm is a BLAS function call and OpenMP is not meant to rewrite it so to use its own GPU-based implementation. After all, this is just a function-call from the OpenMP point-of-view. The thing is if the linked BLAS implementation is not designed to run on a GPU, then OpenMP will not automatically convert the (compiled) code to a GPU (there is no such tool to far because GPU works very differently from CPUs). As a result, OpenMP cannot run this on the GPU if the BLAS is not meant to use the GPU.

The OneAPI documentation mentions GPU offloading using OpenMP and BLAS, but in separate/independent points. It is not clear whether OneMKL has a GPU-based version. AFAIK, it is not available in an OpenMP program, but possibly from a SysCL/DPC++ code but I am not sure this supports iGPUs so far.

Finally, even though you could do that, it will not be efficient on your target hardware. Intel iGPUs like mainstream PC GPUs (ie. client-side) are not designed for the fast computation double-precision operations: only single-precision one. This is because they are design for 3D rendering and 2D acceleration where single-precision is enough and also because single-precision units consume far less power than double-precision (for a same number of items computed per second). This means a cblas_zgemm call will certainly significantly faster on your CPU than on your iGPU (assuming it is possible).

Upvotes: 1

oneMKL can not offload by openmp

Answers (2)

Related Questions