avatus
avatus

Reputation: 11

How can I know which CUDA APIs a program called? Without looking into the source code?

The architecture is something like this:

python-----[call]----->tensor flow for gpu----[call]---->CUDA SDK CUDA-----
[call]---> gpu binary to execute the job or something.

I have tried nvvp to directly analyze from the python script. the result is something cost me 4.6G in memory. And the nvvp gui freezed. So basically I have no idea how to proceed.

Is there a possible way that I can know exactly which CUDA API this whole program called? This problem is not only for tensorflow, I need a general method to solve this so I can later test all related APIs to decide which GPU is suitable for our program.

Upvotes: 1

Views: 229

Answers (1)

talonmies
talonmies

Reputation: 72339

The simplest way to do this is using the command line profiling tool nvprof, with the api summary option, like this:

$ nvprof --print-api-summary ./a.out 
3 6 7 5 3 5 6 2 9 
1 2 7 0 9 3 6 0 6 
2 6 1 8 7 9 2 0 2 
3 7 5 9 2 2 8 9 7 
==18840== NVPROF is profiling process 18840, command: ./a.out
0: 3.000000 5.000000 6.000000 2.000000
1: 9.000000 3.000000 6.000000 0.000000
2: 7.000000 9.000000 2.000000 0.000000
3: 2.000000 2.000000 8.000000 9.000000
3 6 7 5 3 3.142 6 2 9 
1 2 7 0 9 3.142 6 0 6 
2 6 1 8 7 3.142 2 0 2 
3 7 5 9 2 3.142 8 9 7 
==18840== Profiling application: ./a.out
==18840== Profiling result:
==18840== API calls:
Time(%)      Time     Calls       Avg       Min       Max  Name
 41.57%  117.77ms         1  117.77ms  117.77ms  117.77ms  cudaMallocPitch
 31.45%  89.096ms         1  89.096ms  89.096ms  89.096ms  cudaFree
 26.61%  75.398ms         1  75.398ms  75.398ms  75.398ms  cudaDeviceReset
  0.14%  390.33us         1  390.33us  390.33us  390.33us  cudaLaunch
  0.09%  252.51us        91  2.7740us     247ns  98.999us  cuDeviceGetAttribute
  0.08%  225.51us         1  225.51us  225.51us  225.51us  cuDeviceTotalMem
  0.04%  101.02us         1  101.02us  101.02us  101.02us  cudaDeviceSynchronize
  0.02%  43.777us         2  21.888us  21.009us  22.768us  cudaMemcpy2D
  0.01%  32.867us         1  32.867us  32.867us  32.867us  cuDeviceGetName
  0.00%  4.1070us         4  1.0260us     188ns  3.2290us  cudaSetupArgument
  0.00%  3.3560us         3  1.1180us     332ns  2.4330us  cuDeviceGetCount
  0.00%  2.1280us         3     709ns     265ns  1.2330us  cuDeviceGet
  0.00%  1.2200us         1  1.2200us  1.2200us  1.2200us  cudaConfigureCall
  0.00%     885ns         1     885ns     885ns     885ns  cudaPeekAtLastError

This shows all the driver and runtime API calls which the program executed over the life of the CUDA context associated with the application.

Upvotes: 1

Related Questions