Reputation: 258
I am trying to read performance counters with nvprof while executing two kernels concurrently.
nvprof --concurrent-kernels on --events fb_subp0_write_sectors ./myprogram
However by doing this the kernel execution seems to serialize. What I want out of this is exactly how they perform when they are running concurrently.
Is it possible at all to read performance counters when kernels are running concurrently? I do not necessarily need performance per kernel, aggregate data is perfectly fine.
I am running on a Kepler gpu with compute 3.5.
Upvotes: 1
Views: 106
Reputation: 11529
No. nvprof v7.5 and earlier does not support collection of performance counters in a way that is useful for investigating the performance of concurrent kernels. I recommend you submit a feature request through the NVIDIA developer program. This is on the teams task list. Customer feedback helps move features up on the list.
Upvotes: 3