Why Bit-PLRU is different from LRU

Question

Following is description about bit-plru

Bit-PLRU stores one status bit for each cache line. We call these bits MRU- bits. Every access to a line sets its MRU-bit to 1, indicating that the line was recently used. Whenever the last remaining 0 bit of a set's status bits is set to 1, all other bits are reset to 0. At cache misses, the line with lowest index whose MRU-bit is 0 is replaced.

I think both replace policy of lru and bit-plru are same, their miss rate are also same.

My reason: At cache misses, the line with lowest index whose MRU-bit is 0 is replaced.

Line with lowest index means least recently used, so its mru-bit is definitely zero(it couldn't be 1 because it isn't recently used). So, mru-bit is redundant?

If my reason is wrong, could anyone give me some reason or example show me where is different between bit-plru and lru? Why bit-plru gave better performance(miss rate)?

Thanks!

Alain Merigot · Accepted Answer

Least recently used means the line with the oldest access time. But keeping track of accesses to always know which one is the oldest may be complex in a cache context. Storing the access order for every block would require at least ceil(log₂(n!)) bits, or, most probably, n×log₂n bits which is close for n small and much simpler to manage. Whenever a memory reference is accessed, it must be removed from the order list, put at the top and the rest of the list updated. This may be complex to do in one cycle.

This is the reason why pseudo-LRU methods have been developed. They guaranty that an "ancient" line will be ejected, but not that the most ancient will be.

Consider an example for the bit-LRU of your question. We assume that the initial set state is the following.

line    status    real order
(index) (MRU)
0       0         3  LRU
1       1         0  MRU
2       1         1
3       0         2

The real order is not stored, but we will use it to understand the behavior of the algorithm (smallest is youngest).

Now, assume we access existing line 0. Status becomes

line    status    real order
(index) (MRU)
0       1         0  MRU
1       1         1  
2       1         2
3       0         3  LRU

Assume this is followed by a miss, so we apply the method and replace line 3:

Whenever the last remaining 0 bit of a set's status bits is set to 1, all other bits are reset to 0.

line    status    real order
(index) (MRU)
0       0         1  
1       0         2  
2       0         3  LRU
3       1         0  MRU

So the algorithm has properly ejected LRU (line 3).

Assume that there is another miss. The algorithm states :

At cache misses, the line with lowest index whose MRU-bit is 0 is replaced.

So line 0 will be replaced. But it is not LRU which is line 2. It is even the "youngest" in the ancient lines. But is has the lowest index.
To eject another better line would require additional information on the access times.
Maybe some randomness in the ejection could be simply added. But finding the real LRU is more complex.

Note that there are better ways to have a pseudo LRU. Tree-LRU for instance is much better, but it still does not guaranty that the real LRU will be used.
For practical applications, pLRU gives a miss rate similar to real LRU, while being much simpler.

But even real LRU may not always be the best policy and if a line has been accessed frequently it is likely that it will continue to be accessed, and it should probably not be replaced even if it is LRU.
So the most efficient methods extend pLRU by keeping track of the number of accesses and by considering differently lines that are have been accessed only once and lines that have been accessed twice or more. This way, whenever a line has to be ejected, lines that have been accessed only once are preferred.

Why Bit-PLRU is different from LRU

Answers (1)

Related Questions