Set
Set

Reputation: 944

OpenCL 1.2 compiling kernel binary using LLVM

Say I have the OpenCL kernel,

/* Header to make Clang compatible with OpenCL */

/* Test kernel */
__kernel void test(long K, const global float *A, global float *b)
{
    for (long i=0; i<K; i++)
        for (long j=0; j<K; j++)
            b[i] = 1.5f * A[K * i + j];
}

I'm trying to figure out how to compile this to a binary which can be loaded into OpenCL using the clCreateProgramWithBinary command.

I'm on a Mac (Intel GPU), and thus I'm limited to OpenCL 1.2. I've tried a number of different variations on the command,

clang -cc1 -triple spir test.cl -O3 -emit-llvm-bc -o test.bc -cl-std=cl1.2

but the binary always fails when I try to build the program. I'm at my wits' end with this, it's all so confusing and poorly documented.

The performance of the above test function can, in regular C, be significantly improved by applying the standard LLVM compiler optimization flag -O3. My understanding is that this optimization flag some how takes advantage of the contiguous memory access pattern of the inner loop to improve performance. I'd be more than happy to listen to anyone who wants to fill in the details on this.

I'm also wondering how I can first convert to SPIR code, and then convert that to a buildable binary. Eventually I would like to find a way to apply the -O3 compiler optimizations to my kernel, even if I have to manually modify the SPIR (as diffiult as that will be).

I've also gotten the SPIRV-LLVM-Translator tool working (as far as I can tell), and ran,

./llvm-spirv test.bc -o test.spv

and this binary fails to load at the clCreateProgramWithBinary step, I can't even get to the build step.

Possibly SPIRV doesn't work with OpenCL 1.2, and I have to use clCreateProgramWithIL, which unfortunately doesn't exist for OpenCL 1.2. It's difficult to say for sure why it doesn't work.

Please see my previous question here for some more context on this problem.

Upvotes: 1

Views: 817

Answers (1)

pmdj
pmdj

Reputation: 23446

I don't believe there's any standardised bitcode file format that's available across implementations, at least at the OpenCL 1.x level.

As you're talking specifically about macOS, have you investigated Apple's openclc compiler? This is also what Xcode invokes when you compile a .cl file as part of a target. The compiler is located in /System/Library/Frameworks/OpenCL.framework/Libraries/openclc; it does have comprehensive --help output but that's not a great source for examples on how to use it.

Instead, I recommend you try the OpenCL-in-Xcode tutorial, and inspect the build commands it ends up running: https://developer.apple.com/library/archive/documentation/Performance/Conceptual/OpenCL_MacProgGuide/XCodeHelloWorld/XCodeHelloWorld.html

You'll find it produces bitcode files (.bc) for 4 "architectures": i386, x86_64, "gpu_64", and "gpu_32". It also auto-generates some C code which loads this code by calling gclBuildProgramBinaryAPPLE().

I don't know if you can untangle it further than that but you certainly can ship bitcode which is GPU-independent using this compiler.

I should point out that OpenCL is deprecated on macOS, so if that's the only platform you're targeting, you really should go for Metal Compute instead. It has much better tooling and will be actively supported for longer. For cross-platform projects it might still make sense to use OpenCL even on macOS, although for shipping kernel binaries instead of source, it's likely you'll have to use platform-specific code for loading those anyway.

Upvotes: 1

Related Questions