Exception 'cudaError_enum' thrown in cudaGetExportTable (CUDA runtime library)?

Question

I am debugging a MPI-based CUDA program with DDT. My code aborts when the CUDA runtime library (libcudart) throws an exception in the (undocumented) function cudaGetExportTable, when called from cudaMalloc and cudaThreadSynchronize (UPDATED: using cudaDeviceSynchronize gives the same error) in my code.

Why is libcudart throwing an exception (I am using the C API, not the C++ API) before I can detect it in my code with its cudaError_t return value or with CHECKCUDAERROR?

(I'm using CUDA 4.2 SDK for Linux.)

Output:

Process 9: terminate called after throwing an instance of 'cudaError_enum'
Process 9: terminate called recursively

Process 20: terminate called after throwing an instance of 'cudaError'
Process 20: terminate called recursively

My code:

cudaThreadSynchronize();
CHECKCUDAERROR("cudaThreadSynchronize()");

Other code fragment:

const size_t t;  // from argument to function
void* p=NULL;
const cudaError_t r=cudaMalloc(&p, t);
if (r!=cudaSuccess) {
    ERROR("cudaMalloc failed.");
}

Partial Backtrace:

Process 9:
cudaDeviceSynchronize()
-> cudaGetExportTable()
   -> __cxa_throw

Process 20:
cudaMalloc()
-> cudaGetExportTable()
   -> cudaGetExportTable()
      -> __cxa_throw

Memory debugging errors:

Processes 0,2,4,6-9,15-17,20-21:
Memory error detected in Malloc_cuda_gx (cudamalloc.cu:35):
dmalloc bad admin structure list.

This line is the cudaMalloc code fragment shown above. Also:

Processes 1,3,5,10-11,13-14,18-19,23:
Memory error detected in vfprintf from /lib64/libc.so.6:
dmalloc bad admin structure list.

Also, when running on 3 cores/gpus per node instead of 4 gpus per node, dmalloc detects similar memory errors, but when not in debug mode, the code runs perfectly fine with 3 gpus per node (as far as I can tell).

Exception 'cudaError_enum' thrown in cudaGetExportTable (CUDA runtime library)?

Answers (1)

Related Questions

Exception &#39;cudaError_enum&#39; thrown in cudaGetExportTable (CUDA runtime library)?

Answers (1)

Related Questions

Exception 'cudaError_enum' thrown in cudaGetExportTable (CUDA runtime library)?