Reputation: 199
On a compute capability 2.x device how would I make sure that the gpu uses coalesced memory access when using mapped pinned memory and assuming that normally when using global memory the 2D data would require padding?
I can't seem to find information about this anywhere, perhaps I should be looking better or perhaps I am missing something. Any pointers in the right direction are welcome...
Upvotes: 1
Views: 375
Reputation: 21455
The coalescing approach should be applied when using zero copy memory. Quoting the CUDA C BEST PRACTICES GUIDE:
Because the data is not cached on the GPU, mapped pinned memory should be read or written only once, and the global loads and stores that read and write the memory should be coalesced.
Quoting the "CUDA Programming" book, by S. Cook
If you think about what happens with access to global memory, an entire cache line is brought in from memory on compute 2.x hardware. Even on compute 1.x hardware the same 128 bytes, potentially reduced to 64 or 32, is fetched from global memory. NVIDIA does not publish the size of the PCI-E transfers it uses, or details on how zero copy is actually implemented. However, the coalescing approach used for global memory could be used with PCI-E transfer. The warp memory latency hiding model can equally be applied to PCI-E transfers, providing there is enough arithmetic density to hide the latency of the PCI-E transfers.
Upvotes: 2