Reputation: 2201
I have a code that contains know-how I would not like to distribute in source code. One of solutions is to provide a bunch of pre-compiled kernels and choose the correct binary depending on the user's hardware.
How to cover most of the users (AMD and Intel, as Nvidia can use CUDA code) with minimum of the binaries and minimum of the machines where I have to run my offline compiler? Are there families of GPUs that can use the same binaries? CUDA compiler can compile for different architectures, what about OpenCL? Binary compatibility data doesn't seem well documented but maybe someone collected these data for himself.
I know there's SPIR but older hardware doesn't support it.
Here are details of my implementation if someone found this question and did less than I do. I made a tool that compiles the kernel to the file and then I collected all these binaries into a C array to be included into the main application:
const char* binaries[] = {
//kernels/HD Graphics 4000
"\x62\x70\x6c\x69\x73\x74\x30\x30\xd4\x01\x02\x03"
"\x04\x05\x06\x07\x08\x5f\x10\x0f\x63\x6c\x42\x69"
"\x6e\x61\x72\x79\x56\x65\x72\x73\x69\x6f\x6e\x5c"
...
"\x00\x00\x00\x00\x00\x00\x00\x09\x00\x00\x00\x00"
"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x06\x47\xe0"
,
//here more kernels
};
size_t binaries_sizes[] = {
204998,
205907,
...
};
And then I use the following code which iterates all the kernels (I didn't invent anything more clever than trial-and-error, choosing the first kernel that builds successfully, probably there's better solution):
int e3 = -1;
int i = 0;
while (e3 != CL_SUCCESS) {
if (i == lenof(binaries)) {
throw Error();
}
program = clCreateProgramWithBinary(context, 1, &deviceIds[devIdx], &binaries_sizes[i],
(const unsigned char**)&binaries[i],
nullptr, &e3);
if (e3 != CL_SUCCESS) {
++i;
continue;
}
int e4 = clBuildProgram(program, 1, &deviceIds[devIdx],
"", nullptr, nullptr);
e3 = e4;
++i;
}
Upvotes: 3
Views: 306
Reputation: 795
Unfortunately, there is no standard solution for your problem. OpenCL is platform-independent, and there is no standard way (apart from SPIR) to deal with this problem. Each vendor decide a different compiler toolchain internally, and even this can change across multiple versions of the same driver, or for different devices.
You could add some meta-data to the kernel to identify which platform have you compiled it for, which will save you of the trial and error part (i.e, instead of just storing binaries and binaries_size, you can also store binary_platform and binary_device and then iterate through those arrays to see which binary you should load).
The best solution for you would be SPIR (or the new SPIRV), which are intermediate representations that can be then "re-compiled" by the OpenCL driver to the actual architecture instruction set. If you store your binaries in SPIRV, and have access to/knowledge of some compiler magic, you can use a translator tool to get back the LLVM-IR and then compile down to other platforms, such as AMD or PTX, using the LLVM infrastructure (see https://github.com/KhronosGroup/SPIRV-LLVM)
Upvotes: 2