petRUShka
petRUShka

Reputation: 10162

Is there some in-code profiling of CUDA program

In OpenCL world there is function clGetEventProfilingInfo which returns all profiling info of event like queued, submitted, start and end times in nanoseconds. It is quite convenient because I'm able to printf that info whenever I want.

For example with PyOpenCL it is possible to write code like this

profile = event.profile                                                                
print("%gs + %gs" % (1e-9*(profile.end - profile.start), 1e-9*(profile.start - profile.queued)))

which is quite informative for my task.

Is it possible to get such information in code instead of using external profiling tool like nvprof and company?

Upvotes: 0

Views: 538

Answers (2)

ApoorvaJ
ApoorvaJ

Reputation: 830

For quick, lightweight timing, you may want to have a look at the cudaEvent API.

Excerpt from the link above:

cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);


cudaMemcpy(d_x, x, N*sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(d_y, y, N*sizeof(float), cudaMemcpyHostToDevice);

cudaEventRecord(start);
saxpy<<<(N+255)/256, 256>>>(N, 2.0f, d_x, d_y);
cudaEventRecord(stop);

cudaMemcpy(y, d_y, N*sizeof(float), cudaMemcpyDeviceToHost);

cudaEventSynchronize(stop);
float milliseconds = 0;
cudaEventElapsedTime(&milliseconds, start, stop);

printf("Elapsed time: %f ms\n", milliseconds);

If you want a more full-featured profiling library, you should look at CUPTI.

Upvotes: 1

Aznaveh
Aznaveh

Reputation: 578

There is not a tool other than nvprof than can collect profiling data so far. However, you can harness nvprof in your code. Take a look at this Nvida document. You can use cuProfilerStart() and cuProfilerStop() to probe just a part of your code. They are inside cuda_profiler_api.h

Upvotes: 1

Related Questions