Reputation: 2346
I have a __constant__
memory array holding information that is needed by many kernels, which are placed in different source files. This constant memory array is defined in the header GlobalParameters.h
, which is #included
by all files containing kernels that need to access to this array.
I just discovered (look at talonmies' answer) that __constant memory__
is only available in the translation unit where it is defined, unless you turn on separate compilation (with CUDA 5.0
or later).
I still do not get completely what this means for my case.
Assuming that I cannot turn on separate compilation, is there a way for dealing with my needs? Where should I place the definition of my constant memory array? What if I place it in my header, which is #included
in many translation units?
Assuming I can turn on separate compilation, should I declare my __constant__
memory array in the header as extern
and place the definition inside a source file (e.g. GlobalParameters.cu
)?
Upvotes: 4
Views: 2390
Reputation: 1
Yes. The later CUDA doc says: When compiling in the separate compilation mode (see the nvcc user manual for a description of this mode), device, shared, managed and constant variables can be defined as external using the extern keyword. nvlink will generate an error when it cannot find a definition for an external variable (unless it is a dynamically allocated shared variable).
Upvotes: 0
Reputation: 21475
The below example assumes the possibility of using separate compilation. In this case, the below example shows how using extern
to work with constant memory across different compilation units.
FILE kernel.cu
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include "Utilities.cuh"
__constant__ int N_GPU;
__constant__ float a_GPU;
__global__ void printKernel();
int main()
{
const int N = 5;
const float a = 10.466;
gpuErrchk(cudaMemcpyToSymbol(N_GPU, &N, sizeof(int)));
gpuErrchk(cudaMemcpyToSymbol(a_GPU, &a, sizeof(float)));
printKernel << <1, 1 >> > ();
gpuErrchk(cudaPeekAtLastError());
gpuErrchk(cudaDeviceSynchronize());
return 0;
}
FILE otherCompilationUnit.cu
#include <stdio.h>
extern __constant__ int N_GPU;
extern __constant__ float a_GPU;
__global__ void printKernel() {
printf("N = %i; a = %f\n", N_GPU, a_GPU);
}
Upvotes: 2
Reputation: 4422
One way to make constant memory available to translation units other than the one where it is declared, is to call cudaGetSymbolAddress()
and make the pointer available to the other functions.
This technique is playing with fire to some degree, because if you use the pointer to write to the memory without appropriate barriers and synchronization, you may run afoul of the lack of coherency between constant memory and global memory.
Also, you may not get the full performance benefits of constant memory if you use this method. That should be less true on SM 2.x and later hardware - disassemble the object code and make sure the compiler is emitting "load uniform" instructions.
Upvotes: 3
Reputation: 1809
No, without using separate compilation it won't be possible to use the same constant memory, that is declared once, over several .cu files.
In my oppinion there are two ways for a workaround.
First one is to implement all kernels within one .cu file. Therefore you will get the disadvantage that this file will become very large with a bad overview.
A second way would be to declare in every .cu file the constant memory again. Then once with a wrapper copy the values into the constant memory for every single .cu file - like I described in an answer here. Disadvantages would be that you have to ensure that you don't forget to copy the values in single .cu files and you have to check that you won't run in the limitation of total available constant memory.
Upvotes: 0