Reputation: 43149
I've got a C++ Cuda toolkit v9.2 application that works fine built with -O
, but if I build with -g -G
, I get a cuda error 7 at runtime:
too many resources requested for launch
I understand from here that this means:
the number of registers available on the multiprocessor is being exceeded. Reduce the number of threads per block to solve the problem.
I'd rather not reduce the threads per block since it works optimized. What might I do so that for debug builds I use fewer registers, more in line with optimized? How can I track down where the extra register use is coming from in my application?
Upvotes: 0
Views: 219
Reputation: 15941
As also mentioned in the comments above, debug builds typically require more resources due to various reasons.
You can use the --maxrregcount
option or __launch_bounds__
qualifier to set a limit for how many registers the compiler is allowed to used. Do note that turning this knob really just means trading one resource for another. Forcing the compiler to use fewer registers will generally mean it has to spill more. More spills will generally mean increased local memory requirements. In extreme cases, you may run into another limit there…
Upvotes: 1