There are plenty of documentation/publications on CUDA/Nvidia GPUs, but I never encountered anything about TLBs. Do GPUs use TLBs similar to CPUs (and, therefore, have TLB hits/misses)? How are TLB misses handled? By CUDA driver or by GPU HW? Are there cases when TLB misses cause significant/noticeable performance impact?

cudagpunvidiatlb

user2052436

Reputation: 4765

GPU (Nvidia) TLB misses

There are plenty of documentation/publications on CUDA/Nvidia GPUs, but I never encountered anything about TLBs.

Do GPUs use TLBs similar to CPUs (and, therefore, have TLB hits/misses)?
How are TLB misses handled? By CUDA driver or by GPU HW?
Are there cases when TLB misses cause significant/noticeable performance impact?

Upvotes: 1

Answers (1)

Homer512

Reputation: 13310

A TLB does exist. I am not aware of any official documentation but its size can be determined via reverse engineering. See for example Zhe Jia et.al.: Dissecting the NVidia Turing T4 GPU via Microbenchmarking

[…] within the available global memory size, there are two levels of TLB on the Turing GPUs. The L1 TLB has 2 MiB page entries and 32 MiB coverage. The coverage of the L2 TLB is about 8192 MiB, which is the same as Volta.

Upvotes: 2

GPU (Nvidia) TLB misses

Answers (1)

Related Questions