mesibo
mesibo

Reputation: 4323

Least Recently Used (LRU) Cache

I know that I can use various container classes in STL but it's an overkill and expensive for this purpose.

We have over 1M+ users online and per user we need to maintain 8 unrelated 32-bit data items. The goal is to

  1. find if an item exists in the list,
  2. if not, insert. Remove oldest entry if full.

Brute Force approach would be to maintain a last write pointer and iterate (since only 8 items) but I am looking for inputs to better analyze and implement.

Look forward to some interesting suggestions in terms of design pattern and algorithm.

Upvotes: 6

Views: 2949

Answers (6)

Sahib Yar
Sahib Yar

Reputation: 1046

You should use Cuckoo's Filter which is a probabilistic data structure that supports fast set membership testing. It is a hash-based data structure.

Time Complexity of Cuckoo's Filter:

  1. Lookup: O(1)
  2. Deletion: O(1)
  3. Insertion: O(1)

For reference here is how the cuckoo filter works.

Parameters of the filter

 1. Two Hash Functions: h1 and h2
 2. An array B with n Buckets. The i-th Bucket will be called B[i]

Input : L, a list of elements to inserted into the cuckoo filter.

Algorithm:
while L is not empty:
    Let x be the 1st item in the list L. Remove x from the list.
    if (B[h1(x)] == empty)
          place x in B[h1(x)];
    else if (B[h2(x)] == empty)
          place x in B[h2(x)];
    else
          Let y be the element in B[h2(x)]
          Prepend y to L
          place x in B[h2(x)]

For LRU you can use time stamping in your hash function by keeping just a local variable.

This is the best approach for very large data sets to date.

Upvotes: 0

user207421
user207421

Reputation: 310903

Don Knuth gives several interesting and very efficient approximations in The Art of Computer Proramming.

  1. Self-organizing list I: when you find an entry, move it to the head of the list; delete from the end.
  2. Self-organizing list II: when you find an entry, move it up one spot; delete from the end.

    [Both the above in Vol. 3 §6.1(A).]

  3. Another scheme involves maintaining the list circularly with 1 extra bit per entry, which is set when you find that entry, and cleared when you skip past it to find something else. You always start searching at the last place you stopped, and if you don't find the entry you replace the one with the next clear bit with it, i.e. it hasn't been used since one entire trip around the list.

    [Vol. 1 §2.5(G).]

Upvotes: 2

MSalters
MSalters

Reputation: 179819

I agree with Drop and Geza's comments. The straightforward implementation will take one cache line read, and cause one cache line write.

The only performance question left is going to be the lookup and update of that 32 bit value in 256 bits. Assuming modern x86, the lookup itself can be two instructions: _mm256_cmp_epi32_mask finds all equal values in parallel, _mm256_lzcnt_epi32 counts leading zeroes = number of older non-matching items*32. But even with older SIMD operations, the cache line read/write operations will dominate the execution time. And that's in turn is dominated by finding the right user. Which in turn is dominated by the network I/O involved.

Upvotes: 0

Aconcagua
Aconcagua

Reputation: 25526

I personally would either go with the self organising lists as proposed by EJP or, as we only have eight elements, simply store them together with a timestamp sequentially.

When accessing an element, just update the timestamp, when replacing, replace the one with oldest timestamp (one linear search). This is less efficient on replacements, but more efficient on access (no need to move any elements around). And it might be the easiest to implement...

Modification of self organising lists, if based on some array data structure: Sure, on update, you have to shift several elements (variant I) or at least swap two of them (variant II) - but if you organize the data as ring buffer, on replacement we just replace the last element with the new one and move the buffer's pointer to this new element:

a, b, c, d
      ^

Accessing a:

d, b, a, c
      ^

New element e:

d, e, a, c
   ^

Special case: accessing the oldest element (d in this case) - we then simply can move the pointer, too:

d, e, a, c
^

Just: with only 8 elements, it might not be worth the effort to implement all this...

Upvotes: 0

marvel308
marvel308

Reputation: 10458

If you want a C implementation of LRU cache try this link

The idea is that we use two data structures to implement an LRU Cache.

Queue which is implemented using a doubly linked list. The maximum size of the queue will be equal to the total number of frames available (cache size).The most recently used pages will be near front end and least recently pages will be near rear end.
A Hash with page number as key and address of the corresponding queue node as value.
When a page is referenced, the required page may be in the memory. If it is in the memory, we need to detach the node of the list and bring it to the front of the queue.
If the required page is not in the memory, we bring that in memory. In simple words, we add a new node to the front of the queue and update the corresponding node address in the hash. If the queue is full, i.e. all the frames are full, we remove a node from the rear of queue, and add the new node to the front of queue.

Upvotes: 1

shapiro yaacov
shapiro yaacov

Reputation: 2346

You want to use here a combination of a Hash table and a doubly linked list.
Each item is accessible via the hash table that holds the key you need plus a pointer to the element in the list.

Algorithm:

Given new item x, do:
1. Add x to the head of the list, save pointer as ptr.
2. Add x to the hash table where the data is stored, and add ptr.
3. If the list is bigger than allowed, take the last element (from the tail of the list) and remove it. Use the key of this element to remove it from the Hash table as well.

Upvotes: 1

Related Questions