user961614
user961614

Reputation: 11

Why CUDA block size of 256 or 512 gives better performance as compared to others?

I ve written few programs in CUDA C on windows 7. I did the experimentation with the block size. I found that in most of the cases block size of 256 or 512 gives better performance than other. Can any body tell me the exact technical reason behind it? or point out any resource to know. Since other block sizes multiples of 32 (warp) gives less performance. Thanks in advance.

Upvotes: 1

Views: 2278

Answers (1)

ArchaeaSoftware
ArchaeaSoftware

Reputation: 4422

Without actual measurements, there's no way to be sure of the optimal block size for a given chip. If you are doing 2D texturing, for example, a 16x4 block happens to work really well. In your case, it's possible that 512 happens to be a multiple of the number of memory partitions in the chip. (On the GeForce 8800 GTX, with 6 memory partitions, 384 was a really good block size for bandwidth-bound kernels).

Occupancy is just one of many considerations that affect performance - more threads isn't always better - for workloads that can use registers (instead of shared memory) to hold intermediate results, blocks that use more registers and fewer threads work best.

Sorry I can't give a more definitive answer, but it is a complicated issue.

Upvotes: 2

Related Questions