tex1Dfetch unexpectedly returning 0

Question

I don't believe this is the same issue as reported here :

In my CUDA application I noticed that tex1Dfetch is not returning the expected value, past a certain index in the buffer. An initial observation in the application was that a value at index 0 could be read correctly, but at 12705625, the value read was 0. I made a small test program to investigate this, given below. The results are a little bit baffling to me. I'm trying to probe at what index the values no longer are read correctly. But as the value arraySize is changed, so does the "firstBadIndex". Even with arraySize =2, the second value is read incorrectly! As arraySize is made bigger, the firstBadIndex gets bigger. This happens when binding to arrays of float, float2, or float4. If the data are read from the device buffer instead (switch around the commented lines in FetchTextureData), then everything is fine. This is using CUDA 6.5, on a Tesla c2075. Thanks for any insights or advice you might have.

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include 

#define FLOATTYPE float4
texture texture1D;

const unsigned int arraySize = 1000;
FLOATTYPE* host;
FLOATTYPE* device;
FLOATTYPE* dTemp;
FLOATTYPE hTemp[1];

__global__ void FetchTextureData(FLOATTYPE* data,FLOATTYPE* arr,int idx)
{
    data[0] = tex1Dfetch(texture1D, idx);
    //data[0] = arr[idx];
}

bool GetTextureValues(int idx){

    FetchTextureData<<<1,1>>>(dTemp,device,idx);

    // copy to the host
    cudaError_t err = cudaMemcpy(hTemp,dTemp,sizeof(FLOATTYPE),cudaMemcpyDeviceToHost);
    if (err != cudaSuccess) {
        throw "cudaMemcpy failed!";
    }

    if (cudaDeviceSynchronize() != cudaSuccess) {
        throw "cudaDeviceSynchronize failed!";
    }

    return hTemp[0].x == 1.0f;
}

int main()
{

    try{

        host = new FLOATTYPE[arraySize];
        cudaError_t err = cudaMalloc((void**)&device,sizeof(FLOATTYPE) * arraySize);
        cudaError_t err1 = cudaMalloc((void**)&dTemp,sizeof(FLOATTYPE));
        if (err != cudaSuccess || err1 != cudaSuccess) {
            throw "cudaMalloc failed!";
        }

        // make some host data
        for(unsigned int i=0; i();
        texture1D.addressMode[0] = cudaAddressModeClamp;
        texture1D.filterMode = cudaFilterModePoint;
        texture1D.normalized = false;
        cudaBindTexture(NULL, texture1D, device, channelDesc, arraySize);

        // do a texture fetch and find where the fetches stop working
        int lastGoodValue = -1, firstBadValue = -1;
        float4 badValue = {-1.0f,0.0f,0.0f,0.0f};

        for(unsigned int i=0; i

talonmies · Accepted Answer

The problem lies in the texture set up. This:

cudaBindTexture(NULL, texture1D, device, channelDesc, arraySize);

should be:

    cudaBindTexture(NULL, texture1D, device, channelDesc, 
                                   arraySize * sizeof(FLOATTYPE));

As per the documentation, the size argument is the size of the memory area in bytes, not the number of elements. I would have expected that with the clamped addressing mode, the code would still work as expected. With border mode, you should get a zero value which looks like it would trigger your bad value detection. I haven't actually run your code, so perhaps there is a subtlety I'm missing somewhere. For such a simple repro case, your code structure is rather convoluted and hard to follow (at least on the mobile phone screen I am reading it on).

EDIT to add that between the time I started writing this and finished, @njuffa pointed out the same mistake in comments

tex1Dfetch unexpectedly returning 0

Answers (1)

Related Questions