Reputation: 3769
I am a bit confused with how memory access issued by a warp is affected by FP64 data.
Here is my question now:
PS: I am mostly interested in Compute Capability 2.0+ architectures
Upvotes: 3
Views: 349
Reputation: 72349
A warp always consists of 32 threads regardless if these threads are doing FP32 or FP64 calculations. Right?
Correct
I have read that each time a thread in a warp tries to read/write the global memory, the warp accesses 128 bytes (32 single-precision floats). Right?
Not exactly. There are also 32 byte transaction sizes.
So if all the threads in a warp are reading different single precision floats (a total of 128 bytes) from the memory but in a coalesced manner, the warp will issue a single memory transaction. Right?
Correct
What if all threads in the warp try to access different double-precision floats (a total of 256 bytes) in a coalesced manner? Will the warp issue two memory transactions (128+128)?
Yes. The compiler will emit a 64 bit load instruction which will be serviced by two 128 byte transactions per warp when coalesced memory access is possible.
Upvotes: 2