Reputation: 111
I'm confused about copying arrays to constant memory.
According to programming guide there's at least one way to allocate constant memory and use it in order to store an array of values. And this is called static memory allocation:
__constant__ float constData[256];
float data[256];
cudaMemcpyToSymbol(constData, data, sizeof(data));
cudaMemcpyFromSymbol(data, constData, sizeof(data));
According to programming guide again we can use:
__device__ float* devPointer;
float* ptr;
cudaMalloc(&ptr, 256 * sizeof(float));
cudaMemcpyToSymbol(devPointer, &ptr, sizeof(ptr));
It looks like dynamic constant memory allocation is used, but I'm not sure about it. And also no qualifier __constant__
is used here.
So here are some questions:
Upvotes: 1
Views: 2829
Reputation: 4422
The developer can declare up to 64K of constant memory at file scope. In SM 1.0, the constant memory used by the toolchain (e.g. to hold compile-time constants) was separate and distinct from the constant memory available to developers, and I don't think this has changed since. The driver dynamically manages switching between different views of constant memory as it launches kernels that reside in different compilation units. Although you cannot allocate constant memory dynamically, this pattern suffices because the 64K limit is not system-wide, it only applies to compilation units.
Use the first pattern cited in your question: statically declare the constant data and update it with cudaMemcpyToSymbol
before launching kernels that reference it. In the second pattern, only reads of the pointer itself will go through constant memory. Reads using the pointer will be serviced by the normal L1/L2 cache hierarchy.
Upvotes: 6