Reputation: 148
I'm starting with OpenCL programming and learning about the differences between a texture buffer (also called image) and a regular buffer. From what I undersand, one of these differences is the fact that a texture fetch is cached, and with 2D locality.
The question is: where is this texture cache located? Is it shared across threads or is it just useful for accesses within a single thread?
For instance, consider this kernel:
__constant sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;
__kernel void myCoolKernel( __read_only image2d_t image, __global float * dst) {
const int i = get_global_id(0);
const int j = get_global_id(1);
dst[i+j*get_global_size(0)] = read_imagef(image, sampler, (int2){i,j}).x;
}
Will nearby threads (on the same work-group I guess?) take advantage of the image cache? Or is it more useful for a kernel like the one below?
__kernel void myCoolKernel( __read_only image2d_t image, __global float * dst) {
// Pray it's not a boundary
const int i = get_global_id(0);
const int j = get_global_id(1);
float pixel1 = read_imagef(image, sampler, (int2){i+1,j}).x;
float pixel2 = read_imagef(image, sampler, (int2){i,j-1}).x;
float pixel3 = read_imagef(image, sampler, (int2){i-1,j}).x;
float pixel4 = read_imagef(image, sampler, (int2){i,j+1}).x;
dst[i+j*get_global_size(0)] = pixel1+pixel2+pixel3+pixel4;
}
I hope I made myself clear. Thanks everyone.
Upvotes: 1
Views: 982
Reputation: 1055
It'll be GPU dependent. For example, some systems may have several "independent" shader "units", each of which will be running a subset of all available threads. Each shader unit will probably have its own L0 texture cache so that all threads assigned to that unit will be sharing that cache.
However, like a CPU, there will probably be a cache hierarchy such that there's an L1 cache that feeds the multiple shader unit L0s.
So to answer your question...
Will nearby threads (on the same work-group I guess?) take advantage of the image cache?
... yes, if the accesses of the threads as a set are coherent, then it will take advantage of the cache hierarchy.
FWIW there's a little more on texture caches usage on the computer graphics site
Upvotes: 1
Reputation: 6333
It is shared between threads. It exists because of graphics API textures (e.g., OpenGL, DirectX, Vulkan, etc.) but OpenCL images can use it too. It typically has 2D locality, probably due to Z-order storage. It is much faster than non-coalesced buffer access, but maybe not as fast as coalesced buffer access.
Upvotes: 2