Reputation: 21
I'm using CUDA Toolkit 8 with Visual Studio Community 2015. When I try simple vector addition from NVidia's PDF manual (minus error checking which I don't have the *.h's for) it always comes back as undefined values, which means the output array was never filled. When I pre-fill it with 0's, that's all I get at the end.
Others have had this problem and some people are saying it's caused by compiling for the wrong compute capability. However, I am using an NVidia GTX 750 Ti, which is supposed to be Compute Capability 5. I have tried compiling for Compute Capability 2.0 (the minimum for my SDK) and 5.0.
I also cannot make any of the precompiled examples work, such as vectoradd.exe which says, "Failed to allocate device vector A (error code initialization error)!" And oceanfft.exe says, "Error unable to find GLSL vertex and fragment shaders!" which doesn't make sense because GLSL and fragment shading are very basic features.
My driver version is 361.43 and other apps such as Blender Cycles in CUDA mode and Stellarium work perfectly.
Here is the code that should work:
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#include <iostream>
#include <algorithm>
#define N 10
__global__ void add(int *a, int *b, int *c) {
int tid = blockIdx.x; // handle the data at this index
if (tid < N)
c[tid] = a[tid] + b[tid];
}
int main(void) {
int a[N], b[N], c[N];
int *dev_a, *dev_b, *dev_c;
// allocate the memory on the GPU
cudaMalloc((void**)&dev_a, N * sizeof(int));
cudaMalloc((void**)&dev_b, N * sizeof(int));
cudaMalloc((void**)&dev_c, N * sizeof(int));
// fill the arrays 'a' and 'b' on the CPU
for (int i = 0; i<N; i++) {
a[i] = -i;
b[i] = i * i;
}
// copy the arrays 'a' and 'b' to the GPU
cudaMemcpy(dev_a, a, N * sizeof(int),cudaMemcpyHostToDevice);
cudaMemcpy(dev_b, b, N * sizeof(int),cudaMemcpyHostToDevice);
add << <N, 1 >> >(dev_a, dev_b, dev_c);
// copy the array 'c' back from the GPU to the CPU
cudaMemcpy(c, dev_c, N * sizeof(int),cudaMemcpyDeviceToHost);
// display the results
for (int i = 0; i<N; i++) {
printf("%d + %d = %d\n", a[i], b[i], c[i]);
}
// free the memory allocated on the GPU
cudaFree(dev_a);
cudaFree(dev_b);
cudaFree(dev_c);
return 0;
}
I'm trying to develop CUDA apps so any help would be greatly appreciated.
Upvotes: 2
Views: 845
Reputation: 72352
This was apparently caused by using an incompatible driver version with the CUDA 8 toolkit. Installing the driver distributed with the version 8 toolkit solved thr problem.
[Answer assembled from comments and added as a community wiki entry to get the question off the unanswered queue for the CUDA tag]
Upvotes: 1