Reputation: 1212
I have a mapping table which I know I can copy to CUDA constant memory by doing the following:
#define LENGTH 4
#define THREAD_BLOCKS 64
const int mapTable[LENGTH] = {0, 1, 3, 5};
int main()
{
//..
__constant__ int dMapTable[LENGTH];
cudaMemcpyToSymbol( dMapTable , mapTable, size_t(LENGTH) * sizeof(int) , 0, cudaMemcpyHostToDevice );
//..
}
Now what I want to do is to create multiple copies of this table in CUDA constant memory. The number of copies I want to create is equal to number of thread Blocks THREAD_BLOCKS. Can anyone advise me how to do this and to do this efficiently ?
Upvotes: 1
Views: 948
Reputation: 15724
I would be very surprised if you see any improvement in kernel performance by setting up multiple copies of your constant data. The constant memory is cached, so you would just be thrashing the cache with duplicated values.
Also, it's worth noting that the constant memory size is only 64KiB on all devices up to compute capability 3.0.
Still, if you want to check performance, just set up the multiple copies like you normally would and then time the kernel.
Upvotes: 1