Reputation: 583
I have exactly the same problem as described in the post: CUDA Error on cudaBindTexture2D
I even have the following error:
error 18: invalid texture reference." and also experienced "wouldn't throw the error on cudaMalloc, but only on cudaBindTexture
Unfortunately, the poster (Anton Roth) answered his own question in a manner that was a bit too cryptic for someone such as myself who is just starting out with CUDA:
The answer was in the comments, I used a sm that my GPU wasn't compatible to.
The "not compatible with GPU" makes sense since the sample program FluidsGL (called "Fluids (OpenGL Version)" in NVIDIA CUDA Samples Browser) fails on my laptop, but works fine on my desktop at work. Unfortunately, I still don't know what "in the comments" was referring it, or how to even check for GPU SM compatibilities.
Here is the code that seems to be causing the issue:
#define DIM 512
In main
:
setupTexture(DIM, DIM);
bindTexture();
In fluidsGL_kernels.cu
:
texture<float2, 2> texref;
static cudaArray *array = NULL;
void setupTexture(int x, int y)
{
// Wrap mode appears to be the new default
texref.filterMode = cudaFilterModeLinear;
cudaChannelFormatDesc desc = cudaCreateChannelDesc<float2>();
cudaMallocArray(&array, &desc, y, x);
getLastCudaError("cudaMalloc failed");
}
void bindTexture(void)
{
cudaBindTextureToArray(texref, array);//this function itself doesn't throw the error but error 18 is caught by the function below
getLastCudaError("cudaBindTexture failed");
}
Here is the output of deviceQuery
:
Device 0: "GeForce 9800M GS"
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 1.1
Total amount of global memory: 1024 MBytes (1073741824 bytes)
( 8) Multiprocessors x ( 8) CUDA Cores/MP: 64 CUDA Cores
GPU Clock rate: 1325 MHz (1.32 GHz)
Memory Clock rate: 799 Mhz
Memory Bus Width: 256-bit
Max Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536,32768), 3D
=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(8192) x 512, 2D=(8192,8192)
x 512
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per multiprocessor: 768
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Mo
del)
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 8 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simu
ltaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Versi
on = 5.0, NumDevs = 1, Device0 = GeForce 9800M GS
I know my GPU is kind of old, but it still runs most of the examples pretty well.
Upvotes: 0
Views: 4267
Reputation: 8976
You need to compile your code for the proper architecture (as explained in the post you linked).
Since you have a CC 1.1 device, use the following nvcc compilation options:
-gencode arch=compute_11,code=sm_11
The default Visual Studio project or Makefile may not compile for the proper architectures, so always make sure that it does.
For Visual Studio, refer to this answer: https://stackoverflow.com/a/14413360/1043187
For a Makefile, it depends. The CUDA SDK samples often have a GENCODE_FLAGS
variable that you can modify.
Upvotes: 1