Is there some in-code profiling of CUDA program

Question

In OpenCL world there is function clGetEventProfilingInfo which returns all profiling info of event like queued, submitted, start and end times in nanoseconds. It is quite convenient because I'm able to printf that info whenever I want.

For example with PyOpenCL it is possible to write code like this

profile = event.profile                                                                
print("%gs + %gs" % (1e-9*(profile.end - profile.start), 1e-9*(profile.start - profile.queued)))

which is quite informative for my task.

Is it possible to get such information in code instead of using external profiling tool like nvprof and company?

ApoorvaJ · Accepted Answer

For quick, lightweight timing, you may want to have a look at the cudaEvent API.

Excerpt from the link above:

cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);


cudaMemcpy(d_x, x, N*sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(d_y, y, N*sizeof(float), cudaMemcpyHostToDevice);

cudaEventRecord(start);
saxpy<<<(N+255)/256, 256>>>(N, 2.0f, d_x, d_y);
cudaEventRecord(stop);

cudaMemcpy(y, d_y, N*sizeof(float), cudaMemcpyDeviceToHost);

cudaEventSynchronize(stop);
float milliseconds = 0;
cudaEventElapsedTime(&milliseconds, start, stop);

printf("Elapsed time: %f ms
", milliseconds);

If you want a more full-featured profiling library, you should look at CUPTI.

Is there some in-code profiling of CUDA program

Answers (2)

Related Questions