drminix
drminix

Reputation: 131

If I use many local variables in a GPU kernel, will the variables reside in global memory?

If I use many variables in GPU kernel, will the variables reside in global memory? So that reading and writing local variables will require access to global memory?

What is the typical limit on the number of variables in GPU kernel so that variables reside in register?

Thanks, Sam

Upvotes: 2

Views: 2379

Answers (2)

CygnusX1
CygnusX1

Reputation: 21779

Quick answer: Yes. Typical limit? If you want to reach occupancy around 0.5, that will be around 32-64 registers per thread, depending on the architecture.

A bit longer answer: Keep in mind that the number of registers is not exactly the same as "number of local variables". This is because, at a given time, you usually do not need all the local variables and the compiler will try to reuse the registers. You may end up having multiple variables mapped to the same register.

Secondly, even if you run out of register space, the compiler will try to spill those values to global memory which are seldom used. Usually having few register spills in your code is not that time consuming. Moreover those register spills lead to perfectly aligned global memory access pattern.

If you want to know how much registers and (spilled) local memory each of your kernel is using, add --ptxas-options=-v to your compilation parameters.

Upvotes: 4

kangshiyin
kangshiyin

Reputation: 9781

There is a CUDA GPU Occupancy calculator locate in the CUDA installation dir.

cuda-5.0/tools/CUA_Occupancy_Calculator.xls

It can show your the relationship between hardware resouce (threads/block,register,shared mem) and warp occupancy, as well as the physical limits for different GPU compute capability.

Upvotes: 3

Related Questions