Fermi L2 cache hit latency?

Question

Does anyone know related information about L2 cache in Fermi? I have heard that it is as slow as global memory, and the use of L2 is just to enlarge the memory bandwidth. But I can't find any official source to confirm this. Did anyone measure the hit latency of L2? What about size, line size, and other paramters?

In effect, how do L2 read misses affect the performance? In my sense, L2 only has a meaning in very memory-bound applications. Please feel free to give your opinions.

Thanks

Grizzly · Accepted Answer

This thread in the nvidia has some measurements for performance characteristica. While it is not official information, and probably not 100% exact, it gives at least some indication for the behaviour, so I thought it might be useful here (measurements in clockcycles):

1020 non-cached (L1 enabled but not used)

1020 non-cached (L1 disabled)

365 L2 cached (L1 disabled)

88 L1 cached (L1 enabled and used)

Another post in the same thread gives those results:

1060 non-cached

248 L2

18 L1

Fermi L2 cache hit latency?

Answers (2)

Related Questions