Dustin Soodak
Dustin Soodak

Reputation: 583

CUDA Error on cudaBindTexture

I have exactly the same problem as described in the post: CUDA Error on cudaBindTexture2D

I even have the following error:

error 18: invalid texture reference." and also experienced "wouldn't throw the error on cudaMalloc, but only on cudaBindTexture

Unfortunately, the poster (Anton Roth) answered his own question in a manner that was a bit too cryptic for someone such as myself who is just starting out with CUDA:

The answer was in the comments, I used a sm that my GPU wasn't compatible to.

The "not compatible with GPU" makes sense since the sample program FluidsGL (called "Fluids (OpenGL Version)" in NVIDIA CUDA Samples Browser) fails on my laptop, but works fine on my desktop at work. Unfortunately, I still don't know what "in the comments" was referring it, or how to even check for GPU SM compatibilities.

Here is the code that seems to be causing the issue:

#define DIM 512

In main:

setupTexture(DIM, DIM);
bindTexture();

In fluidsGL_kernels.cu:

texture<float2, 2> texref;
static cudaArray *array = NULL;

void setupTexture(int x, int y)
{
    // Wrap mode appears to be the new default
    texref.filterMode = cudaFilterModeLinear;
    cudaChannelFormatDesc desc = cudaCreateChannelDesc<float2>();

    cudaMallocArray(&array, &desc, y, x);
    getLastCudaError("cudaMalloc failed");
}

void bindTexture(void)
{
    cudaBindTextureToArray(texref, array);//this function itself doesn't throw the error but error 18 is caught by the function below
    getLastCudaError("cudaBindTexture failed");
}

Hardware information

Here is the output of deviceQuery:

Device 0: "GeForce 9800M GS"
  CUDA Driver Version / Runtime Version          5.0 / 5.0
  CUDA Capability Major/Minor version number:    1.1
  Total amount of global memory:                 1024 MBytes (1073741824 bytes)
  ( 8) Multiprocessors x (  8) CUDA Cores/MP:    64 CUDA Cores
  GPU Clock rate:                                1325 MHz (1.32 GHz)
  Memory Clock rate:                             799 Mhz
  Memory Bus Width:                              256-bit
  Max Texture Dimension Size (x,y,z)             1D=(8192), 2D=(65536,32768), 3D
=(2048,2048,2048)
  Max Layered Texture Size (dim) x layers        1D=(8192) x 512, 2D=(8192,8192)
 x 512
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 8192
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  768
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Mo
del)
  Device supports Unified Addressing (UVA):      No
  Device PCI Bus ID / PCI location ID:           8 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simu
ltaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Versi
on = 5.0, NumDevs = 1, Device0 = GeForce 9800M GS

I know my GPU is kind of old, but it still runs most of the examples pretty well.

Upvotes: 0

Views: 4267

Answers (1)

BenC
BenC

Reputation: 8976

You need to compile your code for the proper architecture (as explained in the post you linked).

Since you have a CC 1.1 device, use the following nvcc compilation options:

-gencode arch=compute_11,code=sm_11

The default Visual Studio project or Makefile may not compile for the proper architectures, so always make sure that it does.

For Visual Studio, refer to this answer: https://stackoverflow.com/a/14413360/1043187

For a Makefile, it depends. The CUDA SDK samples often have a GENCODE_FLAGS variable that you can modify.

Upvotes: 1

Related Questions