Reputation: 429
i used --ptax-options=-v while compiling my .cu code, it gave the following:
ptxas info: Used 74 registers, 124 bytes smem, 16 bytes cmem[1]
devQuery for my card returns the following:
rev: 2.0
name: tesla c2050
total shared memory per block: 49152
total reg. per block: 32768
now, i input these data into cuda occupancy calculator as follows:
1.) 2.0
1.b) 49152
2.) threads per block: x
registers per thread: 74
shared memory per block (bytes): 124
i was varying the x (threads per block) so that x*74<=32768. for example, i enter 128 (or 256) in place of x. Am I entering all the required values by occupancy calculator correctly? thanks.
Upvotes: 1
Views: 901
Reputation: 11529
ptxas-options=--verbose
(or -v) produces output of the format
ptxas : info : Compiling entry function '_Z13matrixMulCUDAILi16EEvPfS0_S0_ii' for 'sm_10'
ptxas : info : Used 15 registers, 2084 bytes smem, 12 bytes cmem[1]
The critical information is
<Registers Per Thread>, <Static Shared Memory Per Block>, <Constant Memory Per Kernel>
When you fill in the occupancy calculator
<<<GridDim, BlockDim, DynamicSharedMemoryPerBlock, Stream>>>
The Occupancy Calculator Help tab contains additional information.
In your example I believe you are not correctly setting field 1 as Fermi architecture is limited to 63 Registers Per Thread. sm_1* supports a limit of 124 Registers Per Thread.
Upvotes: 4