Curious
Curious

Reputation: 152

Can execution of CUDA kernels from two contexts overlap?

From this, it appears that two kernels from different contexts cannot execute concurrently. In this regard, I am confused when reading CUPTI activity traces from two applications. The traces show kernel_start_timestamp, kernel_end_timestamp and duration (which is kernel_end_timestamp - kernel_start_timestamp).

Application 1: ....... 8024328958006530 8024329019421612 61415082 .......

Application 2: ....... 8024328940410543 8024329048839742 108429199

To make the long timestamp and duration more readable:

Application 1 : kernel X of 61.415 ms ran from xxxxx28.958 s to xxxxx29.019 s

Application 2 : kernel Y of 108.429 ms ran from xxxxx28.940 s to xxxxx29.0488 s

So, the execution of kernel X completely overlaps with that of kernel Y.

I am using the /path_to_cuda_install/extras/CUPTI/sample/activity_trace_async for tracing the applications. I modified CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE to 1024 and CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT to 1. I have only enabled tracing for CUPTI_ACTIVITY_KIND_MEMCPY, CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL and CUPTI_ACTIVITY_KIND_OVERHEAD. My applications are calling cuptiActivityFlushAll(0) once in each of their respective logical timesteps.

Are these erroneous CUPTI values that I am seeing due to improper usage or is it something else?

Clarification : MPS not enabled, running on single GPU

UPDATE: bug filed, this seems to be a known problem for CUDA 6.5 Waiting for a chance to test this with CUDA 7 (have a GPU shared between multiple users and need a window of inactivity for temporary switch to CUDA 7)

Upvotes: 1

Views: 372

Answers (1)

Iman
Iman

Reputation: 188

I don't no how to set the CUPTI activity traces. But, 2 kernels can share a time-span on a single GPU even without the MPS server, though only one will run on the GPU at a time.

If CUDA MPS Server is not in use, then kernels from different contexts cannot overlap. I am assuming that you're not using the MPS server, then time-sliced scheduler will decide which context to access the GPU at a time. without MPS a context can only access the GPU in a time-slots that the time-sliced scheduler assigns to it. Thus, there are only kernels from a single context running on a GPU at a time (without the MPS server).

Note that, it is potentially possible that multiple kernels sharing a time-span with each other on a GPU, but still in that time-span only a kernels from a single context can access the GPU resources (which I am also assuming that you're using a single GPU).

For more information you can also check the MPS Service document

Upvotes: 1

Related Questions