tiki
tiki

Reputation: 429

cuda occupancy calculator

i used --ptax-options=-v while compiling my .cu code, it gave the following:

ptxas info: Used 74 registers, 124 bytes smem, 16 bytes cmem[1]

devQuery for my card returns the following:

rev:  2.0
name: tesla c2050
total shared memory per block: 49152
total reg. per block: 32768

now, i input these data into cuda occupancy calculator as follows:

1.) 2.0
1.b) 49152
2.) threads per block: x
    registers per thread: 74
    shared memory per block (bytes): 124

i was varying the x (threads per block) so that x*74<=32768. for example, i enter 128 (or 256) in place of x. Am I entering all the required values by occupancy calculator correctly? thanks.

Upvotes: 1

Views: 901

Answers (1)

Greg Smith
Greg Smith

Reputation: 11529

ptxas-options=--verbose (or -v) produces output of the format

ptxas : info : Compiling entry function '_Z13matrixMulCUDAILi16EEvPfS0_S0_ii' for 'sm_10'
ptxas : info : Used 15 registers, 2084 bytes smem, 12 bytes cmem[1]

The critical information is

  • 1st line has the target architecture
  • 2nd line has <Registers Per Thread>, <Static Shared Memory Per Block>, <Constant Memory Per Kernel>

When you fill in the occupancy calculator

  • Set field 1.) Select Compute Capability to 'sm_10' in the above example
  • Set field 2.) Register Per Thread to
  • Set field 2.) Share Memory Per Block to + DynamicSharedMemoryPerBlock passed as 3rd parameter to <<<GridDim, BlockDim, DynamicSharedMemoryPerBlock, Stream>>>

The Occupancy Calculator Help tab contains additional information.

In your example I believe you are not correctly setting field 1 as Fermi architecture is limited to 63 Registers Per Thread. sm_1* supports a limit of 124 Registers Per Thread.

Upvotes: 4

Related Questions