Reputation: 1465
I am trying to implement box filter in C-CUDA, starting with implementing matrix average problem in CUDA first. When I try following code without commenting those lines within for loops than I get the certain output. But when I comment those lines then it generates the same output again!
if(tx==0)
for(int i=1;i<=radius;i++)
{
//sharedTile[radius+ty][radius-i] = 6666.0;
}
if(tx==(Dx-1))
for(int i=0;i<radius;i++)
{
//sharedTile[radius+ty][radius+Dx+i] = 7777;
}
if(ty==0)
for(int i=1;i<=radius;i++)
{
//sharedTile[radius-i][radius+tx]= 8888;
}
if(ty==(Dy-1))
for(int i=0;i<radius;i++)
{
//sharedTile[radius+Dy+i][radius+tx] = 9999;
}
if((tx==0)&&(ty==0))
for(int i=globalRow,l=0;i<HostPaddedRow,l<radius;i++,l++)
{
for(int j=globalCol,m=0;j<HostPaddedCol,m<radius;j++,m++)
{
//sharedTile[l][m]=8866;
}
}
if((tx==(Dx-1))&&(ty==(Dx-1)))
for(int i=(HostPaddedRow+1),l=(radius+Dx);i<(HostPaddedRow+1+radius),l<(TILE+2*radius);i++,l++)
{
for(int j=HostPaddedCol,m=(radius+Dx);j<(HostPaddedCol+radius),m<(TILE+2*radius);j++,m++)
{
//sharedTile[l][m]=7799.0;
}
}
if((tx==(Dx-1))&&(ty==0))
for(int i=(globalRow),l=0;i<HostPaddedRow,l<radius;i++,l++)
{
for(int j=(HostPaddedCol+1),m=(radius+Dx);j<(HostPaddedCol+1+radius),m<(TILE+2*radius);j++,m++)
{
//sharedTile[l][m]=9966;
}
}
if((tx==0)&&(ty==(Dy-1)))
for(int i=(HostPaddedRow+1),l=(radius+Dy);i<(HostPaddedRow+1+radius),l<(TILE+2*radius);i++,l++)
{
for(int j=globalCol,m=0;j<HostPaddedCol,m<radius;j++,m++)
{
//sharedTile[l][m]=0.0;
}
}
__syncthreads();
You can ignore those for loop conditions and all, they are irrelevant here right now. May basic problem and question is why am I getting the same vales even after commenting those lines? I tried making some modification in my main program and kernel as well. Also entered manual errors and removed them, and again compiled and executed the same code, but still getting same values. Is there any way to clear cache memory in CUDA? I am using Nsight + RedHat + CUDA 5.5. Thanks in advance.
Upvotes: 1
Views: 5488
Reputation: 5697
why am I getting the same vales even after commenting those lines?
Seems like sharedTile is pointing to the same piece of memory between multiple consecutive runs which is absolutely normal. Therefore commented out code does not "generate" anything, it is just your pointer pointing to the same memory which was not flushed.
Is there any way to clear cache memory in CUDA
I believe you are talking about clearing shared memory? If so then you can use analogy of approach described here. Instead of using cudaMemset
in host code you'll be zeroing out your shared memory from inside of kernel. The simplest approach is to place following code at the beginning of your kernel which declares sharedTile
(this is for one dimensional thread blocks, one dimensional shared memory array):
__global__ void your_kernel(int count) {
extern __shared__ float* sharedTile;
for (int i = threadIdx.x; i < count; i += blockDim.x)
sharedTile[i] = 0.0f;
__syncthreads();
// your code here
}
Following approaches do not guarantee clear shared memory as Robert Crovella pointed out in below comment:
nvidia-smi
with
--gpu-reset
parameter.Upvotes: 2