Reputation: 381
I don't believe this is the same issue as reported here :
CUDA 1D texture fetch always return 0
In my CUDA application I noticed that tex1Dfetch is not returning the expected value, past a certain index in the buffer. An initial observation in the application was that a value at index 0 could be read correctly, but at 12705625, the value read was 0. I made a small test program to investigate this, given below. The results are a little bit baffling to me. I'm trying to probe at what index the values no longer are read correctly. But as the value arraySize is changed, so does the "firstBadIndex". Even with arraySize =2, the second value is read incorrectly! As arraySize is made bigger, the firstBadIndex gets bigger. This happens when binding to arrays of float, float2, or float4. If the data are read from the device buffer instead (switch around the commented lines in FetchTextureData), then everything is fine. This is using CUDA 6.5, on a Tesla c2075. Thanks for any insights or advice you might have.
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#define FLOATTYPE float4
texture<FLOATTYPE,cudaTextureType1D,cudaReadModeElementType> texture1D;
const unsigned int arraySize = 1000;
FLOATTYPE* host;
FLOATTYPE* device;
FLOATTYPE* dTemp;
FLOATTYPE hTemp[1];
__global__ void FetchTextureData(FLOATTYPE* data,FLOATTYPE* arr,int idx)
{
data[0] = tex1Dfetch(texture1D, idx);
//data[0] = arr[idx];
}
bool GetTextureValues(int idx){
FetchTextureData<<<1,1>>>(dTemp,device,idx);
// copy to the host
cudaError_t err = cudaMemcpy(hTemp,dTemp,sizeof(FLOATTYPE),cudaMemcpyDeviceToHost);
if (err != cudaSuccess) {
throw "cudaMemcpy failed!";
}
if (cudaDeviceSynchronize() != cudaSuccess) {
throw "cudaDeviceSynchronize failed!";
}
return hTemp[0].x == 1.0f;
}
int main()
{
try{
host = new FLOATTYPE[arraySize];
cudaError_t err = cudaMalloc((void**)&device,sizeof(FLOATTYPE) * arraySize);
cudaError_t err1 = cudaMalloc((void**)&dTemp,sizeof(FLOATTYPE));
if (err != cudaSuccess || err1 != cudaSuccess) {
throw "cudaMalloc failed!";
}
// make some host data
for(unsigned int i=0; i<arraySize; i++){
FLOATTYPE data = {1.0f, 0.0f, 0.0f, 0.0f};
host[i] = data;
}
// and copy it to the device
err = cudaMemcpy(device,host,sizeof(FLOATTYPE) * arraySize,cudaMemcpyHostToDevice);
if (err != cudaSuccess){
throw "cudaMemcpy failed!";
}
// set up the textures
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<FLOATTYPE>();
texture1D.addressMode[0] = cudaAddressModeClamp;
texture1D.filterMode = cudaFilterModePoint;
texture1D.normalized = false;
cudaBindTexture(NULL, texture1D, device, channelDesc, arraySize);
// do a texture fetch and find where the fetches stop working
int lastGoodValue = -1, firstBadValue = -1;
float4 badValue = {-1.0f,0.0f,0.0f,0.0f};
for(unsigned int i=0; i<arraySize; i++){
if(i % 100000 == 0) printf("%d\n",i);
bool isGood = GetTextureValues(i);
if(firstBadValue == -1 && !isGood)
firstBadValue = i;
if(isGood)
lastGoodValue = i;
else
badValue = hTemp[0];
}
printf("lastGoodValue %d, firstBadValue %d\n",lastGoodValue,firstBadValue);
printf("Bad value is (%.2f)\n",badValue.x);
}catch(const char* err){
printf("\nCaught an error : %s\n",err);
}
return 0;
}
Upvotes: 1
Views: 267
Reputation: 72349
The problem lies in the texture set up. This:
cudaBindTexture(NULL, texture1D, device, channelDesc, arraySize);
should be:
cudaBindTexture(NULL, texture1D, device, channelDesc,
arraySize * sizeof(FLOATTYPE));
As per the documentation, the size argument is the size of the memory area in bytes, not the number of elements. I would have expected that with the clamped addressing mode, the code would still work as expected. With border mode, you should get a zero value which looks like it would trigger your bad value detection. I haven't actually run your code, so perhaps there is a subtlety I'm missing somewhere. For such a simple repro case, your code structure is rather convoluted and hard to follow (at least on the mobile phone screen I am reading it on).
EDIT to add that between the time I started writing this and finished, @njuffa pointed out the same mistake in comments
Upvotes: 3