Reputation: 35521
How can I estimate the cuda performance of cards that I don't own, ie. new cards?
For instance I found an incomplete Cuda example and the author wrote, that it takes him 0,7 s on his GF 8600 GT. But on my Quadro it takes 1,7s.
My question is: Is the code which I used to fill the gaps faulty or is the GF 8600 really twice as fast?
The kernel is memory bound, but my card has an higher memory bandwidth. I don't know what conclusions to draw from this.
Name Quadro FX 580 GeForce 8600 GT
CUDA Cores 32 32
Core clock (MHz) 450 540
Memory clock (MHz) 400 700
Memory BW (GB/s) 25.6 22.4
Shader Clock (MHz) ???? 1180
Upvotes: 3
Views: 899
Reputation: 6753
Just want to provide you with some pointers that may be possible sources of error. Firstly, use cudaEvents to time your code, not cuda profiler as cudaEvents is more accurate. Secondly, please check what the author is measuring; is he only talking about the computation time, or is he also considering the time to transfer data to and from the GPU. Are you measuring the same time?
Secondly, the cuda architecture is changing quite fast. For example, for cards with cc 1.x, it is suggested that we should use shared memory to get better performance; however, for cards with cc 2.x, there is a L1 cache with each multiprocessor that makes global memory accesses quite fast. So, you may aslo want to compare the architecture of the two cards and their compute capabilities.
Upvotes: 2