hash join
hash join

Reputation: 83

What's the differences between the kernel fusion and persistent thread?

In CUDA programming, I try to reduce the synchronization overhead between the off-chip memory and on-chip memory if there is data dependency between two kernels? What's the differences between these two techniques?

Upvotes: 0

Views: 2831

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151849

The idea behind kernel fusion is to take two (or more) discrete operations, that could be realized (and might already be realized) in separate kernels, and combine them so the operations all happen in a single kernel.

The benefits of this may or may not seem obvious, so I refer you to this writeup.

Persistent threads/Persistent kernel is a kernel design strategy that allows the kernel to continue execution indefinitely. Typical "ordinary" kernel design focuses on solving a particular task, and when that task is done, the kernel exits (at the closing curly-brace of your kernel code).

A persistent kernel however has a governing loop in it that only ends when signaled - otherwise it runs indefinitely. People often connect this with the producer-consumer model of application design. Something (host code) produces data, and your persistent kernel consumes that data and produces results. This producer-consumer model can run indefinitely. When there is no data to consume, the consumer (your persistent kernel) simply waits in a loop, for new data to be presented.

Persistent kernel design has a number of important considerations, which I won't try to list here but instead refer you to this longer writeup/example.

Benefits:

  1. Kernel fusion may combine work into a single kernel so as to increase performance by reduction of unnecessary loads and stores - because the data being operated on can be preserved in-place in device registers or shared memory.

  2. Persistent kernels may have a variety of benefits. They may possibly reduce the latency associated with processing data, because the CUDA kernel launch overhead is no longer necessary. However another possible performance factor may be the ability to retain state (similar to kernel fusion) in device registers or shared memory.

Kernel fusion doesn't necessarily imply a persistent kernel. You may simply be combining a set of tasks into a single kernel. A persistent kernel doesn't necessarily imply fusion of separate computation tasks - there may be only 1 "task" that you are performing in a governing "consumer" loop.

But there is obviously considerable conceptual overlap between the two ideas.

Upvotes: 3

Related Questions