ReverseFlowControl
ReverseFlowControl

Reputation: 826

Does cacheline size affect memory access latency?

Intel architecture has had 64 byte caches for a long time. I am curious, if instead of 64-byte cache lines a processor had 32-byte or 16-byte cachelines, would this improve the RAM-to-register data transfer latency? if so, how much? if not, why?

Thank you.

Upvotes: 2

Views: 1174

Answers (1)

Alain Merigot
Alain Merigot

Reputation: 11567

Transferring a larger amount of data of course increases the communication time. But the increase is very small due the way memory are organized and it does it does not impact memory to register latency.

Memory access operations are done in three steps:

  1. bitline precharge: row address is sent and the internal busses of memory are precharged (duration tRP)
  2. row access: an internal row of a memory is read and written to internal latches. During that time, column address is sent (duration tRCD)
  3. column access: the selected columns are read in the row latches and start to be sent to the processor (duration tCL)

Row access is a long operation. A memory is a matrix of cell elements. To increase the capacity of memory, cells must be rendered as small as possible. And when reading a row of cells, one has to drive a very capacitive and large bus that goes along a memory column. The voltage swing is very low and there are sense amplifiers amplifiers to detect small voltage variations.

Once this operation is done, a complete row is memorized in latches and reading them can be fast and are generally sent in burst mode.

Considering a typical DDR4 memory, with a 1GHz IO cycle time, we generally have tRP/tRCD/tCL=12-15cy/12-15cy/10-12cy and the complete time is around 40 memory cycles (if processor frequency is 4GHz, this is ~160 processor cycles). Then data is sent in burst mode twice per cycle, and 2x64 bits are sent every cycle. So, data transfer adds 4 cycles for 64 bytes and it would add only 2 cycles for 32 bytes.

So reducing cache line from 64B to 32B would reduce the transfer time by ~2/40=5%

If row address do not change, precharging and reading memory row is not required and the access time is ~15 memory cycles. In that case, the relative increase of time for transferring 64B vs 32B is larger but still limited: ~2/15~15%.

Both evaluations do not take into account the extra time required to process a miss in the memory hierachy and the actual percentage will be even smaller.

Data can be sent "critical word first" by the memory. If processor requires a given word, the address of this word is sent to memory. Once the row is read, memory sends first this word, then the other words in the cache line. So, caches can serve processor request as soon as the first word is received, whatever cache line is, and decreasing line width would have no impact on cache latency. So if using this feature, memory-to-register time would not change.

In recent processors, exchanges between different caches levels are based on the cache line width and sending the critical word first does not bring any gain.

Besides that, large line sizes reduce mandatory misses thanks to spatial locality and reducing line size would have a negative impact on cache miss rate.

Last, using larger cache lines increases data transfer rate between cache and memory.

The only negative aspect of large cache lines (besides the small transfer time increase) are that the number of lines in the cache is reduced and conflict misses may increase. But with the large associativity of modern caches, this effect is limited.

Upvotes: 4

Related Questions