Can execution of CUDA kernels from two contexts overlap?

Question

From this, it appears that two kernels from different contexts cannot execute concurrently. In this regard, I am confused when reading CUPTI activity traces from two applications. The traces show kernel_start_timestamp, kernel_end_timestamp and duration (which is kernel_end_timestamp - kernel_start_timestamp).

Application 1: ....... 8024328958006530 8024329019421612 61415082 .......

Application 2: ....... 8024328940410543 8024329048839742 108429199

To make the long timestamp and duration more readable:

Application 1 : kernel X of 61.415 ms ran from xxxxx28.958 s to xxxxx29.019 s

Application 2 : kernel Y of 108.429 ms ran from xxxxx28.940 s to xxxxx29.0488 s

So, the execution of kernel X completely overlaps with that of kernel Y.

I am using the /path_to_cuda_install/extras/CUPTI/sample/activity_trace_async for tracing the applications. I modified CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE to 1024 and CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT to 1. I have only enabled tracing for CUPTI_ACTIVITY_KIND_MEMCPY, CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL and CUPTI_ACTIVITY_KIND_OVERHEAD. My applications are calling cuptiActivityFlushAll(0) once in each of their respective logical timesteps.

Are these erroneous CUPTI values that I am seeing due to improper usage or is it something else?

Clarification : MPS not enabled, running on single GPU

UPDATE: bug filed, this seems to be a known problem for CUDA 6.5 Waiting for a chance to test this with CUDA 7 (have a GPU shared between multiple users and need a window of inactivity for temporary switch to CUDA 7)

Can execution of CUDA kernels from two contexts overlap?

Answers (1)

Related Questions