Reputation: 13
I am new to OpenCL and learn it from Matthew Scarpino's “OpenCL in Action”. I examined an example of matrix multiplication by a vector (p. 11-13). For some reason, the example did not work on my computer. The kernel did not return the values. I began to look for simple examples of outputting data from the kernel.
I found in Wesley Shillingford’s youtube channel an example of outputting the string of characters “Hello world!” from the kernel. On my home computer, the example worked. However, the OpenCL "kitchen" has remained closed since the example presented was written in C++. The brevity of the code obscured the concept of what is happening. So I began to look further for examples in C code.
Among the answers on the Stackoverflow , I found an example of a minimal OpenCL program that increments the value in the kernel. I took this code as the basis for writing my program, because it is simple and convenient for a beginner. As I later found out, the example contained an error.
Another great example convinced me to use pointers to return data from the kernel. Using arrays to store the output values of the kernel leads to the fact that the values of the target array do not change, and the data from the kernel disappears during output. I realized that we need to use pointers to output data from the kernel. However, this did not help me. The problem of transferring data from the kernel to the host program remains. Please correct me if I am mistaken in something. The essence of the topic: The kernel does not return character data to the host program. What could be the problem?
#include <CL/cl.h>
#include <stdio.h>
#include <stdlib.h>
int main(){
cl_platform_id *platforms =NULL;
cl_device_id *devices=NULL;
cl_context context;
cl_command_queue cmdQueue;
cl_program program;
cl_kernel kernel = NULL;
char *cpOutputData;
int output_size = 8;
cl_mem output_buff;
cl_int status; // to check the output of each API call
const char *source =
"__kernel void Hello( __global char* ch) {\n"
" ch[0]='P';"
" ch[1]='r';"
" ch[2]='i';"
" ch[3]='v';"
" ch[4]='e';"
" ch[5]='t';"
" ch[6]='!';"
" ch[7]='\0';"
"}\0";
printf("GetPlatformIDs... ");
cl_uint numPlatforms = 0;
//STEP 1: Discover and initialize platforms
// Use clGetPlatformIDs to retreive the number of platforms
status = clGetPlatformIDs(0,
NULL,
&numPlatforms);
// Allocate enough space for each platform
platforms = (cl_platform_id*)malloc(numPlatforms*sizeof(cl_platform_id));
// Fill in platforms with clGetPlatformIDs()
status=clGetPlatformIDs(numPlatforms,
platforms,
NULL);
printf("\nNumber of discovered platforms is %d. ", numPlatforms);
// STEP 2: Discover and initialize devices
printf("OK.\nGetDeviceIDs... ");
cl_uint numDevices = 0;
// Use clGetDeviceIDs() to retrieve the number of devices present
status = clGetDeviceIDs(platforms[0],
CL_DEVICE_TYPE_CPU,
0,
NULL,
&numDevices);
// Allocate enough space for each device
devices = (cl_device_id*)malloc(numDevices*sizeof(cl_device_id));
// Fill in devices with clGetDeviceIDs()
clGetDeviceIDs(platforms[0],
CL_DEVICE_TYPE_CPU,
numDevices,
devices,
NULL);
printf("\nNumber of discovered devices is %d. ", numDevices);
// STEP 3: Create a context
printf("OK.\nCreating context... ");
// Create context using clCreateContext() and associate it with the devices
context = clCreateContext(NULL,
numDevices,
devices,
NULL,
NULL,
&status);
// STEP 4: Create a command queue
printf("OK.\nQueue creating... ");
cmdQueue = clCreateCommandQueue(context,
devices[0],
CL_QUEUE_PROFILING_ENABLE,
&status);
// STEP 5: Create device buffers
printf("OK.\nOutput buffer creating... ");
output_buff = clCreateBuffer(context,
CL_MEM_WRITE_ONLY,
sizeof(char)*output_size,
NULL,
&status);
// STEP 6: Create and compile program
printf("OK.\nBuilding program... ");
// Create a program using clCreateProgramWithSource()
program = clCreateProgramWithSource(context,
1,
(const char**)&source,
NULL,
&status);
// Build (compile) the program for the devices with clBuildProgram()
status=clBuildProgram(program,
numDevices,
devices,
NULL,
NULL,
NULL);
// STEP 7: Create a kernel
printf("OK.\nCreating kernel... ");
kernel = clCreateKernel(program,
"Hello",
&status);
// STEP 8: Set kernel arguments
// Associate ouput buffer with the kernel
printf("OK.\nSetting kernel arguments... ");
status = clSetKernelArg(kernel,
0,
sizeof(cl_mem),
&output_buff);
// STEP 9: Configure the work-item structure
// Define an index space (global work size) of work itmes for execution.
// A workgroup size (local work size) is not required, but can be used.
size_t globalWorkSize[1];
// There are 'elements' work-items
globalWorkSize[0] = output_size;
// STEP 10: Enqueue the kernel for execution
printf("OK.\nExecuting kernel... ");
//Execute the kernel by using clEnqueueNDRangeKernel().
// 'globalWorkSize' is the 1D dimension of the work-items
clEnqueueNDRangeKernel(cmdQueue,
kernel,
1,
NULL,
globalWorkSize,
NULL,
0,
NULL,
NULL);
clFinish(cmdQueue);
// STEP 11: Read the ouput buffer back to the host
printf("OK.\nReading buffer... ");
// Allocate space for the data to be read
cpOutputData = (char*)malloc(output_size*sizeof(char));
// Use clEnqueueReadBuffer() to read the OpenCL ouput buffer to the host ouput array
clEnqueueReadBuffer(cmdQueue,
output_buff,
CL_TRUE,
0,
output_size,
cpOutputData,
0,
NULL,
NULL);
printf("\nPrinting output data: \n");
printf(cpOutputData);
// STEP 12: Releasing resources
printf("\n...Releasing OpenCL resources... ");
clReleaseKernel(kernel);
clReleaseProgram(program);
clReleaseCommandQueue(cmdQueue);
clReleaseMemObject(output_buff);
clReleaseContext(context);
printf("OK.\n...Releasing host resources... ");
free(cpOutputData);
free(platforms);
free(devices);
printf("OK.\nEnd of program. Bey!\n");
system("PAUSE");
return 0;
}
My program's execution output is here.
Upvotes: 1
Views: 328
Reputation: 23428
The problem you're running into is VERY subtle, and unfortunately you don't have error checking in the one place that would have caught it. Specifically, compiling the source of your kernel using clBuildProgram
fails, and unfortunately the status
isn't checked. I'm not sure why the rest of the program doesn't produce errors on your implementation, it certainly does on mine.
The reason your kernel source isn't valid is this line:
" ch[7]='\0';"
// ^^---- This terminates the string early!
Basically, your kernel source code looks like this to the OpenCL compiler:
__kernel void Hello( __global char* ch) {
ch[0]='P';
ch[1]='r';
ch[2]='i';
ch[3]='v';
ch[4]='e';
ch[5]='t';
ch[6]='!';
ch[7]='
because the escape code \0
in your string literal inserts an actual nul character in the memory that the source
variable ends up pointing to, causing it to be treated as the end of your kernel source code.
What you actually want is for the escape sequence to appear in your OpenCL kernel's code, so you need to escape it twice: once for the C compiler of your host program, and the second time for your OpenCL compiler. That would be:
" ch[7]='\\0';"
// ^--- note second backslash
The double backslash is converted to a single backslash in your source
string, where the OpenCL compiler combines it with the subsequent zero to turn the character literal into a nul character.
With that fix, it all works!
I recommend writing kernel source code in a separate file. Either load that file using file I/O in your program, or auto-generate the literal for the data to embed in your source code. The unix tool xxd
can do this using the -i
flag, you can probably find a Windows equivalent, or even a Windows build of that tool itself.
Upvotes: 1