troore
troore

Reputation: 797

For CUDA, does a thread stay on a single SP on GPU?

When programming CUDA, we all know that a thread block will be scheduled on a SM and will not migrate to other SMs. As for a thread of a thread block, will it stay on a single SP throughout its execution, or its instructions could be scheduled on different SPs arbitrarily?

Upvotes: 2

Views: 690

Answers (3)

Greg Smith
Greg Smith

Reputation: 11529

A CUDA core is a pipelined execution unit capable of executing single precision and integer instructions. Common other names for a CUDA core would be an ALU, math datapath, data pipeline, ... The CUDA core is the execution and write back stage of the SM.

CUDA cores are one of several types of execution units in an SM. Others include the Load Store Units (LSU), Branch Units, Double Precision Units, and Special Function Units.

EDIT:

A CUDA core does not manage threads/warps. The front end fetches instructions, decodes the instructions, reads the registers and dispatches (issues) the warp (instruction+registers) to a FP/INT execution unit core) or one of other types of execution units.

Think of a CUDA core as a classic microprocessor pipelined execution unit (ADU, ALU, AVX, ...).

Upvotes: 3

Xiaolong Xie
Xiaolong Xie

Reputation: 121

It is not necessary to restrict one thread on one fixed SP, and I believe it is easy and efficient to freely issue threads to any SP in a fixed group of SPs(I mean, one SM may be divided to different blocks to ease the design and minimize inter-connection).

Upvotes: 1

Tom
Tom

Reputation: 21108

The programming model does not restrict a thread to a single CUDA core. A thread block must execute on a single SM since threads within a block can communicate through shared memory and shared memory is only accessible by threads within the same SM. That restriction is part of what allows GPUs to scale from mobile to supercomputing.

Why is it helpful to know whether a thread executes on a single CUDA core? The CUDA model is for throughput computing, which means that when one thread (warp) is doing an operation with some latency the hardware can instantly bring in another thread (warp) to fill the gap. As a result it shouldn't matter where any given thread is executing.

Upvotes: 6

Related Questions