Reputation: 4765
There are plenty of documentation/publications on CUDA/Nvidia GPUs, but I never encountered anything about TLBs.
Do GPUs use TLBs similar to CPUs (and, therefore, have TLB hits/misses)?
How are TLB misses handled? By CUDA driver or by GPU HW?
Are there cases when TLB misses cause significant/noticeable performance impact?
Upvotes: 1
Views: 780
Reputation: 13310
A TLB does exist. I am not aware of any official documentation but its size can be determined via reverse engineering. See for example Zhe Jia et.al.: Dissecting the NVidia Turing T4 GPU via Microbenchmarking
[…] within the available global memory size, there are two levels of TLB on the Turing GPUs. The L1 TLB has 2 MiB page entries and 32 MiB coverage. The coverage of the L2 TLB is about 8192 MiB, which is the same as Volta.
Upvotes: 2