Reputation: 158
In CUDA, I understand that the variable would be placed in shared memory if it was defined as __ shared __ and one would be placed in constant memory if it was defined as __ constant __.Also, those being allocated memory using cudamalloc() are put in GPU global memory. But where are those variable without prefixs like __ shared __ , __ constant __ and register placed? For example, the variable i as follow:
__device__ void func(){
int i=0;
return;
}
Upvotes: 1
Views: 261
Reputation: 131808
Automatic variables, i.e. variables without memory space specification within the scope of functions, are placed in one of the following locations:
When optimized away:
1.1 Nowhere - if the variable isn't actually necessary. This actually happens a lot, since CUDA functions are often inlined, with some variables becoming copies of a variable in the calling function. Example (note the x
from foo()
in the compilation of bar()
- completely gone).
1.2 Immediate values in the program's compiled code - if the variable's value is constant, and doesn't get updated, its value may simply be "plugged" into the code. Here's an example with two variables taking constants, which are replaced with the constant which is their sum.
When not optimized away:
2.1 Registers - If your variable can't be optimized-away, the better alternative is to keep it in a hardware register on (one of the symmetric multiprocessor core on) the GPU. Example (the variables x
and y
are placed in registers %r1
and %r2
).
the best and most performant option, which the compiler
2.2 'Local' memory - The 'local memory' of a CUDA thread is an area in global device memory which is (in principle) accessible only by that thread.
Now, obviously, local memory is much slower to use. When will the compiler choose it, then?
The CUDA Programming Guide gives us the answer:
Upvotes: 2
Reputation: 50623
GPUs have a bunch of space dedicated for many registers stored directly in GPU computing units (ie. Streaming Multiprocessors). Registers are not stored in memory unless there is some register spilling happening (typically when you use too many registers in a given kernel). Register have no address unlike all memory bytes. The same thing happen for CPU except that the number of CPU registers is usually much smaller than on GPU. For example, an Intel Skylake core has 180 integer register and 168 vector registers while the instruction set architecture is limited to something like 16 integer/vector registers. Note that in case of register spilling, the value of registers is temporary stored in local memory (typically in the L1 cache if possible). Here is the overall memory hierarchy of a basic (Nvidia Fermi) GPU:
For more information, consider reading: Local Memory and Register Spilling.
Upvotes: 1
Reputation: 2598
Local variables are either placed in hardware registers or local memory (which is effectively global memory).
In your example, however, variable i
will be removed by the compiler because it is unused.
Upvotes: 2