Reputation: 1
I am testing the following code on my own local machines (both on Archlinux and on Ubuntu 16.04 using nvidia driver 390 and cuda 9.1) and on our local HPC clusters:
#include <iostream>
#include <cufft.h>
int main(){
// Initializing variables
int n = 1024;
cufftHandle plan1d;
double2 *h_a, *d_a;
// Allocation / definitions
h_a = (double2 *)malloc(sizeof(double2)*n);
for (int i = 0; i < n; ++i){
h_a[i].x = sin(2*M_PI*i/n);
h_a[i].y = 0;
}
cudaMalloc(&d_a, sizeof(double2)*n);
cudaMemcpy(d_a, h_a, sizeof(double2)*n, cudaMemcpyHostToDevice);
cufftResult result = cufftPlan1d(&plan1d, n, CUFFT_Z2Z, 1);
// ignoring full error checking for readability
if (result == CUFFT_INVALID_DEVICE){
std::cout << "Invalid Device Error\n";
exit(1);
}
// Executing FFT
cufftExecZ2Z(plan1d, d_a, d_a, CUFFT_FORWARD);
//Executing the iFFT
cufftExecZ2Z(plan1d, d_a, d_a, CUFFT_INVERSE);
// Copying back
cudaMemcpy(h_a, d_a, sizeof(double2)*n, cudaMemcpyDeviceToHost);
}
I compile with nvcc cuda_test.cu -lcufft
On both of my local machines, the code works just fine; however, I have tried using the same code on our HPC clusters and it will return the CUFFT_INVALID_DEVICE error on that hardware / configuration. Here's the hardware and driver configuration for those devices.
According to this, the cuda versions should be fine with the driver versions available; however, I receive a similar error when I had my drivers and cuda installations incorrect on my local ubuntu machine before.
I am completely baffled at how to continue here and can only think of a few things:
As a note: this is just an example, but I have a larger codebase that does not seem to run due to this error and I am trying to figure out how to solve that issue currently.
Thanks for reading and let me know if you have any ideas on how to proceed!
EDIT -- added a comment and fixed a typo in main code, after Rob's comment.
Upvotes: 0
Views: 547
Reputation: 11
I have had a similar problem, and it turned out to be a conflict between the Cray wrappers and the cuda toolkit. Not loading the cudatoolkit module, enabling dynamic linking and using the compiler-provided libraries solved the problem.
PS: I am using PGI Fortran 17.5, so not an exact match.
Upvotes: 1