Reputation: 25
I have several questions related cuda programming and GPU architecture to ask:
1.Given that the memory bandwidth of the GPU is 144 Gb/s and the PCIe bus bandwidth is 2.25 Gb/s, how many milliseconds should it take to transfer an array with 100,000,000 doubles to the GPU?
2.Given that the size of the GPU global memory is 3Gb, what is the maximum array size that you can process? If you had to process a longer array, how could you change your program to accomplish this?
I don't know how to calculate this, who can help me, thanks
Upvotes: 1
Views: 2370
Reputation: 151934
The PCIE bus will be the limiting factor here. Just divide the total data transfer size (in bytes) by the speed (in bytes/sec) to get the duration (in sec). 2.25 Gb/s doesn't look like a typical transfer speed for PCIE that I am aware of, but possibly it is the case on your system. Modern systems can usually hit a speed of ~6GB/s (for PCIE Gen2 x16 link) or ~11GB/s (for PCIE Gen3 x16 link). You can measure your transfer speed (possible) with the bandwidthTest
CUDA sample code. Note that to get peak transfer throughput in your application, it is usually necessary to transfer to/from a pinned allocation (<-- hint, click and read).
If a GPU has 3GB of memory total, some of that will be used up by CUDA and other system overheads. The remaining "free" amount can be estimated using either nvidia-smi
utility or the cudaMemGetInfo()
runtime API call. The free memory is approximately an "upper bound" on the total data storage possible. The amount you actually can allocate will be some amount less than this. If you determine or estimate the amount you can allocate, then divide this quantity (in bytes) by the size of the data element you want to store. For example a double
quantity takes up 8 bytes of storage. The C library sizeof()
function can be used to discover this. Once you divide the available memory size by the element size, you will have the total number of elements that can be stored in that amount of memory. The actual amount that is workable will be somewhat less than what is given by the estimate.
Upvotes: 2