Kristian D'Amato
Kristian D'Amato

Reputation: 4046

Fetching CUDA texture problems

I am having trouble fetching a texture of floats. The texture is defined as follows:

texture<float, 2, cudaReadModeElementType> cornerTexture;

The binding and parameter settings are:

cornerTexture.addressMode[0]    = cudaAddressModeClamp;
cornerTexture.addressMode[1]    = cudaAddressModeClamp;
cornerTexture.filterMode        = cudaFilterModePoint;
cornerTexture.normalized        = false;
cudaChannelFormatDesc cornerDescription = cudaCreateChannelDesc<float>();


cudaBindTexture2D(0, &cornerTexture, cornerImage->imageData_device, &cornerDescription, cornerImage->width, cornerImage->height, cornerImage->widthStep);

height and width are the sizes of the two dimensions in terms of numbers of elements. widthStep is in terms of number of bytes. In-kernel access occurs as follows:

thisValue = tex2D(cornerTexture, thisPixel.x, thisPixel.y);
printf("thisPixel.x: %i thisPixel.y: %i thisValue: %f\n", thisPixel.x, thisPixel.y, thisValue);

thisValue should always be a non-negative float. printf() is giving me strange, useless values that are different from what the linear memory actually stores. I have tried offsetting the access with a 0.5f on both coordinates, but it gives me the same wrong results.

Any ideas?

Update There seems to be a hidden alignment requirement. From what I can deduce, the pitch passed to the cudaBindTexture function needs to be a multiple of 32 bytes. For example, the following gives incorrect results

cudaBindTexture2D(0, &debugTexture, deviceFloats, &debugDescription, 10, 32, 40)

when fetching the texture, but the following (the same array with its width and height switched) works well:

cudaBindTexture2D(0, &debugTexture, deviceFloats, &debugDescription, 32, 10, 128)

I'm not sure whether I'm missing something or there really is a constraint on the pitch.

Update 2: I have filed a bug report with Nvidia. Those who are interested can view it in their developer zone, but I will post the reply back here.

Upvotes: 2

Views: 3487

Answers (3)

ArchaeaSoftware
ArchaeaSoftware

Reputation: 4422

There is definitely a constraint on the pitch, and unfortunately there is no device properties query to ask CUDA what it is.

But if you allocate the memory with cudaMallocPitch() and use the pitch passed back, that is guaranteed to work.

Upvotes: 3

pQB
pQB

Reputation: 3127

Did you get the structure associated to the texture using the cudaGetTextureReference function?

From version 3.2 of the NVIDIA C Programming Guide (page 32, last paragraph):

The format specified when binding a texture to a texture reference must match the parameters specified when declaring the texture reference; otherwise, the results of texture fetches are undefined.

Upvotes: 1

Kristian D&#39;Amato
Kristian D&#39;Amato

Reputation: 4046

Nvidia reply to bug report:

"The problem here is that the memory bound to the 2D texture does not have the proper alignment restrictions. Both the base offset of the texture memory, and the pitch, have certain HW dependant alignment restrictions. However, currently in the CUDA API, we only expose the base offset restriction as a device property, and not the pitch restriction.

The pitch restriction will be addressed in a future CUDA release. Meanwhile, it's recommended that apps use cudaMallocPitch() when allocating pitched memory, so that the driver takes care of satisfying all restrictions."

Upvotes: 2

Related Questions