Reputation: 11006
I have a texture which I am using for reading an image. So, the texture is defined as:
texture<uchar4, 2, cudaReadModeNormalizedFloat> text;
I have a CUDA kernel which uses this texture to read some image pixel value as:
__global__ void resample_2D(float4* result,
int width, nt height,
float* x, float* y)
{
const int _x = blockDim.x * blockIdx.x + threadIdx.x;
const int _y = blockDim.y * blockIdx.y + threadIdx.y;
if (_x < width && _y < height) {
const int i = _y * width + _x;
result[i] res = tex2D<float4>(text, x[i] + 0.5f, y[i] + 0.5f);
}
}
Now, I have 4 CUDA streams that could be reading this texture (so accessing the same image that is bound to the texture). So, my question is does that take a performance hit? So, is it better to have 4 textures (one for each stream) rather than one texture used by all streams in terms of performance?
Upvotes: 1
Views: 180
Reputation: 2250
Textures in CUDA work as cached memory. Having multiple streams on a SMX looking for memory in the same texture location could only improve cache hits.
Upvotes: 1