Commenting clfinish() out makes program %100 faster

Question

I have an opencl kernel batch class that enqueues more than 40 kernels and there is a clFinish() between each kernel execution.

Question: When I comment out the clFinish()s then program runs %100 faster, does this mean there are undefined behaviours between kernels when they use same buffers to write and read? I suspect performance increase comes from better utilisation of resources of gpu/occupation. Does using an in-order command queue guarantee a barrier between two kernels or do I need a clFinish() between them?

Its a simple physics model and no differency is visible between two versions.

Maybe I just need to use clFinish() at the end?

Edit: This command queue is not using CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE.

chippies · Accepted Answer

You do not need to use clFinish inbetween kernel calls if all kernel calls execute on the same command queue and this is an in-order queue, which is exactly your scenario so you don't need all those clFinish calls, just keep the last one.

The performance difference comes from a combination of the following:

There is overhead in each clFinish call to send all queued commands to the device and check their execution status.
These clFinish calls also create breaks in the work being done by the GPU - the GPU has to wait for the next kernel to be enqueued and sent to the device after clFinish. Omitting the calls to clFinish means that kernel calls get sent to the GPU while previous kernels are still executing.

Commenting clfinish() out makes program %100 faster

Answers (2)

Related Questions