Reputation: 11
I have a problem. I have two kernels in the loop, which is at the top. I want to see the total time of all running kernels, even though the first kernel will be run for example 10 times a second for example 5 times.
Thank you very much.
for (int arrayLength = minLengthArray; arrayLength <= N; arrayLength *= 2)
{
int threadCount = 0;
int batchSize = N / arrayLength;
int GroupCount = (batchSize * arrayLength) / SHARED_SIZE_LIMIT;
size_t local = SHARED_SIZE_LIMIT / 2;
size_t global = local * GroupCount;
if (arrayLength <= SHARED_SIZE_LIMIT)
{
err = clEnqueueNDRangeKernel(commands, ddEvenMergeSortSharedKernel, 1, NULL, &global, &local, 0, NULL, NULL);
}
else
{
clSetKernelArg(oddEvenMergeSortSharedKernel, 5, sizeof(unsigned int), &SHARED_SIZE_LIMIT);
err = clEnqueueNDRangeKernel(commands, oddEvenMergeSortSharedKernel, 1, NULL, &global, &local, 0, NULL, NULL);
for (int size = 2 * SHARED_SIZE_LIMIT; size <= arrayLength; size <<= 1)
for (unsigned stride = size / 2; stride > 0; stride >>= 1)
{
global = batchSize * arrayLength;
local = 256;
err = clEnqueueNDRangeKernel(commands, oddEvenMergeSortGlobalKernel, 1, NULL, &global, &local, 0, NULL, NULL);
}
}
}
Upvotes: 1
Views: 2129
Reputation: 54639
It partially depends on how to want to "schedule" the timing.
Some general hints:
You'll have to enable event profiling for your command queue, by passing the CL_QUEUE_PROFILING_ENABLE
flag when you create your command queue:
commands = clCreateCommandQueue(
context, device, CL_QUEUE_PROFILING_ENABLE, &err);
In your loop, you have to create events for your kernel calls. Here you have different options: You could either collect the events for the kernel calls (in some vector, list or array), or perform the timing of each event individually. The basic approach would be as follows, sketeched for the first kernel (error handling omitted)
cl_event event0; // creating an event variable for timing
clEnqueueNDRangeKernel(commands, ddEvenMergeSortSharedKernel,
1, NULL, &global, &local, 0, NULL, &event0); // Pass in event here
clWaitForEvents (1, &event0); // Wait for the event
// Obtain the start- and end time for the event
unsigned long start = 0;
unsigned long end = 0;
clGetEventProfilingInfo(event0,CL_PROFILING_COMMAND_START,
sizeof(cl_ulong),&start,NULL);
clGetEventProfilingInfo(event0,CL_PROFILING_COMMAND_END,
sizeof(cl_ulong),&end,NULL);
// Compute the duration in nanoseconds
unsigned long duration = end - start;
// Don't forget to release the vent
clReleaseEvent(event0);
The duration (in nanoseconds) may then be accumulated accordingly. The same scheme could be used for the second kernel.
However, you should consider a few things:
duration = processEvent(event0)
that waits for the given event, then computes the event duration, releases the event and returns the computed durationAnd most importantly:
Upvotes: 3