Kirill
Kirill

Reputation: 470

What can cause this cuda stack trace and what is wrong with this call to cudaMemcpy?

My program, which draws a small animation, uses glut and cuda, and is written in C++, hangs after a while, and I see the following trace in the debugger when I interrupt it a few seconds after it hangs:

Program received signal SIGINT, Interrupt.
0x000000011302a84c in cuGraphicsGLRegisterBuffer ()
(gdb) bt
#0  0x000000011302a84c in cuGraphicsGLRegisterBuffer ()
#1  0x000000011306bc36 in cuGraphicsGLRegisterBuffer ()
#2  0x0000000113039455 in cuGraphicsGLRegisterBuffer ()
#3  0x0000000113006864 in cuGraphicsGLRegisterBuffer ()
#4  0x000000011303cbe6 in cuGraphicsGLRegisterBuffer ()
#5  0x000000011303d972 in cuGraphicsGLRegisterBuffer ()
#6  0x0000000113028bc6 in cuGraphicsGLRegisterBuffer ()
#7  0x000000011302a090 in cuGraphicsGLRegisterBuffer ()
#8  0x000000011301fcb2 in cuGraphicsGLRegisterBuffer ()
#9  0x0000000112ffcead in cuGraphicsGLRegisterBuffer ()
#10 0x0000000113001718 in cuGraphicsGLRegisterBuffer ()
#11 0x0000000112ff27cf in cuMemcpyDtoH_v2 ()
#12 0x00000001001d70c4 in cudaGetExportTable ()
#13 0x00000001002098a5 in cudaMemcpy ()

(This is the top of the stack trace; the rest is my own functions, one of which calls cudaMemcpy.)

If I try to interrupt right the moment after it hangs, the trace looks like this:

#0  0x00007fffffe0026d in __spin_lock ()
#1  0x00007fff880f855b in pthread_mutex_unlock ()
#2  0x000000011303ad89 in cuGraphicsGLRegisterBuffer ()
#3  0x000000011303b972 in cuGraphicsGLRegisterBuffer ()
#4  0x0000000113026bc6 in cuGraphicsGLRegisterBuffer ()
#5  0x0000000113028090 in cuGraphicsGLRegisterBuffer ()
#6  0x000000011301dcb2 in cuGraphicsGLRegisterBuffer ()
#7  0x0000000112ffaead in cuGraphicsGLRegisterBuffer ()
#8  0x0000000112fff718 in cuGraphicsGLRegisterBuffer ()
#9  0x0000000112ff07cf in cuMemcpyDtoH_v2 ()
#10 0x00000001001d70c4 in cudaGetExportTable ()
#11 0x00000001002098a5 in cudaMemcpy ()

I don't know how to approach this. cudaPeekAtLastError does not give any error before that call to cudaMemcpy. I also know that I can run the programs included in nvidia's SDK. Further, the program runs for several seconds before hanging, which means that all the cudaMemcpy calls before it hangs execute without producing errors, so there doesn't seem to be any issue specifically with how I call cudaMemcpy, or with pointers being incorrectly allocated (if they were, I would expect cuda to just generate an error, not hang).

The card is GeForce 9400M, Cuda driver/runtime 4.2, Cuda capability 1.1.

Any advice?

Upvotes: 0

Views: 722

Answers (1)

Peter
Peter

Reputation: 14947

I'd guess you're having a pointer problem, such as trying to copy past the end of a buffer (either source or destination), or referencing a bad pointer altogether. Once you start stepping on invalid memory, don't expect any sane error reporting or useful backtrace.

Looking at your backtrace, the GLRegister calls could be called because you're unintentionally trying to copy from a device memory space mapped to OpenGL.

Try cuda-memcheck, and/or valgrind. Or, since is easily reproducible, start by verifying (by debugger or by printf) the values you're passing into memcpy. Or, start manually binary-searching by disabling parts of the code until things work again.

Upvotes: 1

Related Questions