Reputation: 525
assume that we have enough global memory. Does replacing int
with short
improve the performance in CUDA? (like short
saves the usage of shared memory, registers, etc)
Advices are welcomed. Thanks.
Upvotes: 3
Views: 1970
Reputation: 48330
Depends:
If your program is memory bound then Yes transferring the input as shorts could be beneficial.
If your kernel is computation bound is more likely to be No because the kernel have to do an extra operation to convert from short to int and then back to short each time.
Upvotes: 4
Reputation: 4422
Tesla-class hardware (SM 1.x) has surprisingly rich support for "half registers," so you might get some mileage from using short instead of int on those platforms. You can confirm by using cuobjdump to look at the microcode in the cubin. But Fermi removed that support.
With SM 2.1, NVIDIA added support for "video" instructions that implement 32-bit-wide SIMD operations on 32-bit registers - see section 8.7.9 of the PTX 2.1 spec.
http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/ptx_isa_2.1.pdf
Upvotes: 1
Reputation: 5209
Using short
in shared memory will most likely reduce performance due to bank-conflicts, until you use short2
.
Also, as far as I know, all registers on GPU are 32-bit, so it's unlikely that using short
would reduce register usage.
Upvotes: 4