Reputation:
I want to read a text file and store it in an array. Then, I want to transfer the array from the host to the device and store it in the shared memory. I have written the following code,but the execution time has been increased compared with using the global memory. I cannot understand what the reason can be? Also, it will be great if someone can help me write this code using constant memory.
__global__ void deviceFunction(char *pBuffer,int pSize){
extern __shared__ char p[];
int i;
for(i=0;i<pSize;i++)}
p[i] = pBuffer[i];
}
}
int main(void){
cudaMalloc((void**)&pBuffer_device,sizeof(char)*pSize);
cudaMemcpy(pBuffer_device,pBuffer,sizeof(char)*pSize,cudaMemcpyHostTo Device);
kernel<<<BLOCK,THREAD>>>(pBuffer_device,pSize);
}
Upvotes: 0
Views: 1353
Reputation: 5554
Upvotes: 1