Reputation: 7067

OpenCL events ambiguity

Referring to clGetEventProfilingInfo documentation, cl_event resulted from clEnqueueNDRangeKernel could be:

CL_PROFILING_COMMAND_QUEUED

when the command identified by event is enqueued in a command-queue by the host.

CL_PROFILING_COMMAND_SUBMIT

when the command identified by event that has been enqueued is submitted by the host to the device associated with the commandqueue.

CL_PROFILING_COMMAND_START

when the command identified by event starts execution on the device.

CL_PROFILING_COMMAND_END

when the command identified by event has finished execution on the device.

Assume visualizing the whole kernel profiling:

COMMAND_QUEUED -> COMMAND_SUBMIT -> COMMAND_START -> COMMAND_END

& the corresponding timeline:

Queueing -> Submitting -> Executing

Where:

Queueing = COMMAND_SUBMIT - COMMAND_QUEUED
Submitting = COMMAND_START - COMMAND_SUBMIT
Executing = COMMAND_END - COMMAND_START

Questions:
Is my previous equations true? if so, What's the real difference between queueing and submitting? In other words, if I want to divide the whole process into COMMUNICATION (offloading) time and COMPUTATION (executing) time, What will be their equations?

Upvotes: 0

Answers (2)

DarkZeros

Reputation: 8410

Is my previous equations true?

Yes.

If so, What's the real difference between queueing and submitting? In other words, if I want to divide the whole process into COMMUNICATION (offloading) time and COMPUTATION (executing) time, What will be their equations?

Queueing:

Time spend waiting for other tasks to finish in order to start the current one. In other words waiting for CL_COMPLETE state of all the depending events, or having free resources in the current queued queue.
Note: CPUs will have 0 queue time when queueing to an idle device, because they are synchronous. While GPUs will ALWAYS have some small queueing time anyway (due to the asynchronous behaviour). This is the reason to pipeline as much as possible to GPU devices.

Submitting:

Time spent preparing the current task (compile LLVM, move buffers, preparing device Cores, etc), should be small, but not 0.

If you are looking for a formula only "Submitting" and "Executing" are valid for calculating the current task overhead. Ignore queueing since it does not depend on your task:

Active% = Executing/(Executing+Submitting)
Overhead% = Submitting/(Executing+Submitting)

Upvotes: 0

Dithermaster

Reputation: 6343

Your interpretation seems fairly true. QUEUED is when you called the OpenCL API (such as clEnqueueNDRangeKernel). SUBMIT is when the runtime gave the work to the device. START is when it started execution, END is when the execution finished. There are three states between these four times. The first state is idle on the host. The second state is idle on the device. The third state is executing on the device. If you wish to combine the first two into "communication" then add them together (or use COMMAND_START - COMMAND_QUEUED).

Upvotes: 1

OpenCL events ambiguity

Answers (2)

Related Questions