Reputation: 149
We've successfully used the following post to help create structs that contain basic types like int *. Textures provide a nice performance boost for read-only arrays. We use many of them, which makes the argument lists for the kernels and the kernel sub-functions long and complicated. We'd like to embed the Textures in structures to reduce the argument length and complexity.
Copying a struct containing pointers to CUDA device
Here's a snippet representing the code methodology we use. It compiles, but crashes at run-time.
// Initialize texture description
memset(&textureDescription, 0, sizeof(textureDescription));
textureDescription.readMode = cudaReadModeElementType;
// Create Texture from variable
cudaTextureObject_t texture = 0;
cudaResourceDesc resource;
memset(&resource, 0, sizeof(resource));
resource.resType = cudaResourceTypeLinear;
resource.res.linear.devPtr = intArray;
resource.res.linear.desc.f = cudaChannelFormatKindSigned;
resource.res.linear.desc.x = 32; // bits per channel
resource.res.linear.sizeInBytes = count*sizeof(int);
cudaCreateTextureObject(&texture, resource, &textureDescription, NULL);
// These declarations are in the .h file
typedef struct SampleStructure {
cudaTextureObject_t texture;
} SampleStructure;
SampleStructure *structureHost;
SampleStructure *structureDevice;
// Create host and device structures
structureHost = (SampleStructure *)malloc(sizeof(SampleStructure));
cudaMalloc(&structureDevice, sizeof(SampleStructure));
// Assign the texture object to the host structure
structureHost->texture = texture;
// Copy the host structure to Global Memory
cudaMemcpy(structureDevice, structureHost, sizeof(SampleStructure), cudaMemcpyHostToDevice));
// Pass Texture and Texture-embedded-in-structure to kernel
kenerl<<<1,1>>>(texture, structureDevice);
...
__global__ void
kernel(cudaTextureObject_t texture, SampleStructure *structureDevice) {
value = tex1Dfetch<int>(texture, index); // Runs successfully at runtime
value = tex1Dfetch<int>(structureDevice->texture, index); // Crashes at runtime
}
When using the "texture" variable in the kernel code (or sub-function), it runs correctly. When using "structureDevice->texture" instead, it crashes at run-time.
Can someone show a simple code showing how to successfully embed a texture object in a struct that's passed to a kernel and runs without crashing? Or can someone point out where the mistake might be in the code that we've presented?
Upvotes: 3
Views: 1039
Reputation: 149
Passing the structure by value got a working solution. Here is the code equivalent that gets it to work. Thanks to @talonmies for the suggestion.
While a structure can simplify the argument list, it can slow down the execution because the system has to make a 2 calls to Global Memory instead of 1: 1 call to get the structure and 1 call to get the texture. To improve the performance, the structure can be copied to shared memory. Using the structure in shared memory improves performance.
// Create the Texture Object
cudaResourceDesc resource;
memset(&resource, 0, sizeof(resource));
resource.resType = cudaResourceTypeLinear;
resource.res.linear.devPtr = intArray;
resource.res.linear.desc.f = cudaChannelFormatKindSigned;
resource.res.linear.desc.x = 32; // bits per channel
resource.res.linear.sizeInBytes = count*sizeof(int);
cudaCreateTextureObject(&texture, resource, &textureDescription, NULL);
// These structure declarations are in the .h file
typedef struct SampleStructure {
cudaTextureObject_t texture;
} SampleStructure;
SampleStructure structureHost;
// Assign the texture object to the host structure
structureHost.texture = texture;
// Pass Texture and Texture-object-embedded-in-structure to kernel
kenerl<<<1,1>>>(texture, structureHost);
...
__global__ void
kernel(cudaTextureObject_t texture, SampleStructure structureDevice) {
__shared__ SampleStructure structureSharedMemory;
// Copy the structure to shared memory for faster access
if (threadIdx.x == 0)
structureSharedMemory = structureDevice;
__threadfence_block();
value = tex1Dfetch<int>(texture, index); // Runs successfully at runtime
value = tex1Dfetch<int>(structureSharedMemory.texture, index); // Runs successfully at runtime
}
Upvotes: 4