Bolster
Bolster

Reputation: 7916

CUDA/PyCUDA: Diagnosing launch failure that disappears under cuda-gdb

Anyone know likely avenues of investigation for kernel launch failures that disappear when run under cuda-gdb? Memory assignments are within spec, launches fail on the same run of the same kernel every time, and (so far) it hasn't failed within the debugger.

Oh Great SO Gurus, What now?

Upvotes: 0

Views: 842

Answers (2)

fabmilo
fabmilo

Reputation: 48330

CUDA GDB can make some of the cuda operations synchronous.

  • Are you reading from a memory after has been initialized ?
  • are you using Streams?
  • Are you launching more than one kernel?
  • Where and how does it fail ?

Upvotes: 0

talonmies
talonmies

Reputation: 72349

cuda-gdb spills all shared memory and registers to local memory. So when something runs ok built for debugging and fails otherwise, it usually means out of bounds shared memory access. cuda-memcheck might help, depending on what sort of card you are using. Fermi is better than older cards in that respect.

EDIT: Casting my mind back to the bad old days, I remember having an ornery GT9500 which used to throw similar NV13 errors and have random code failures when running very memory intensive kernels with a lot of shared memory activity. Never when debugging. I put it down to bad hardware and moved on to a GT200, never to see a similar error since. One possibility might be bad hardware. Is this a G92 (9800GT or similar)?

Upvotes: 2

Related Questions