Reputation: 131546
This is a question about discrete GPUs, mostly recent GPUs (NVIDIA Kepler, Maxwell; and whatever's in AMD Kaveri and R290's).
How much does it take to load an otherwise-uncached element into a register from...
A link to a table somewhere would be great, an explanation would be ok...
Upvotes: 5
Views: 2462
Reputation: 8827
It varies on gpu, generation, how its integrated(like pcie) and other things. I work with ASM often and these are numbers that I work with:
-Global device memory? around 300-800 clocks. (motherboard mounted GPUs like laptops that use main memory have slower memory)
-Global memory L2 cache? around 100 clock cycles
-Texture cache(s)? guessing 50-100 clock cycles
-Constant cache(s)? around 1-3 clock cycles if it is in the cache or else L2 cache (~50-100 clocks) or even global mem 300-500 clocks. (depending on if it is a cache hit or miss)
-Per-core (i.e. Per-SMX/SMM in Kepler/Maxwell) L1 cache? around 1-3 clock cycles
-Per-core (i.e. Per-SMX/SMM in Kepler/Maxwell) shared memory? around 1-3 clock cycles
I also did some online searches to see how close I was and found this. The numbers are different then mine. http://lpgpu.org/wp/wp-content/uploads/2013/05/poster_andresch_acaces2014.pdf I think the actual time it takes vs what the programmer should be working with are two different numbers because of the multi-threading. Hope this helps.
Upvotes: 4