Memcache tags simulation

Question

Memcached is a great scalable cache layer but it have one big problem (for me) that it cannot manage tags. And tags are really useful for group invalidation.

I have done some research and I'm aware about some solutions:

Memcache tag fork http://code.google.com/p/memcached-tag/
Code implementation to emulate tags (ref. Best way to invalidate a number of memcache keys using standard php libraries?)

One of my favorite solution is namespace, and this solution is explained on memcached wiki.

However I don't understand why we are integrate namespace on key cache?

From what I understood about namespace trick is: to generate key we have to get value of the namespace (on cache). And if the namespace->value cache entry is evicted, we can no longer compute the good key to fetch cache... So the cache for this namespace are virtually invalidate (I said virtually because the cache still exist but we can no more compute the key to access).

So why can we not simply implement something like:

tag1->[key1, key2, key5]
tag2->[key1, key3, key6]
key1->["value" => value1, "tags" => [tag1, tag2]]
key2->["value" => value2, "tags" => [tag1]]
key3->["value" => value3, "tags" => [tag3]]
etc...

With this implementation I come back with the problem that if tag1->[key1, key2, key5] is evicted we can no more invalidate tag1 key. But with

function load($cacheId) {
   $cache = $memcache->get($cacheId);
   if (is_array($cache)) {
      $evicted = false;
      // Check is no tags have been evicted
      foreach ($cache["tags"] as $tagId) {
         if (!$memcache->get($tagId) {
            $evicted = true;
            break;
         }
      }
      // If no tags have been evicted we can return cache
      if (!$evicted) {
         return $cache
      } else {
         // Not mandatory
         $memcache->delete($cacheId);
      }
      // Else return false
      return false;
   }
}

It's pseudo code

We are sure to return cache if all of this tags are available.

And first thing we can say it's "each time you need to get cache we have to check(/get) X tags and then check on array". But with namespace we also have to check(/get) namespace to retrieve namespace value, the main diff is to iterate under an array... But I do not think keys will have many tags (I cannot imagine more than 10 tags/key for my application), so iterate under size 10 array it's quite speed..

So my question is: Does someone already think about this implementation? And What are the limits? Did I forget something? etc

Or maybe I have missunderstand the concept of namespace...

PS: I'm not looking for another cache layer like memcached-tag or redis

user1466119 · Accepted Answer

I think you are forgetting something with this implementation, but it's trivial to fix.

Consider the problem of multiple keys sharing some tags:

key1 -> tag1 tag2
key2 -> tag1 tag2
tag1 -> key1 key2
tag2 -> key1 key2

Say you load key1. You double check both tag1 and tag2 exist. This is fine and the key loads.

Then tag1 is somehow evicted from the cache.

Your code then invalidates tag1. This should delete key1 and key2 but because tag1 has been evicted, this does not happen.

Then you add a new item key3. It also refers to tag1:

key3 -> tag1

When saving this key, tag1 is (re)created:

tag1 -> key3

Later, when loading key1 from cache again your check in the pseudo code to ensure tag1 exists succeeds. and the (stale) data from key1 is allowed to be loaded.

Obviously a way around this is to check the values of the tag1 data to ensure the key you are loading is listed in that array and only consider your key valid if this is true.

Of course this could have performance issues depending on your use case. If a given key has 10 tags, but each of those tags is used by 10k keys, then you are having to do search through an array of 10k items to find your key and repeat that 10 times each time you load something.

At some point, this may become inefficient.

An alternative implementation (and one which I use), is more appropriate when you have a very high read to write ratio.

If reads are very much the common case, then you could implement your tag capability in a more permanent database backend (I'll assume you have a db of sorts anyway so it only needs a couple extra tables here).

When you write an item in the cache, you store the key and the tag in a simple table (key and tag columns, one row for each tag on a key). Writing a key is simple: "delete from cache_tags where id=:key; foreach (tags as tag) insert into cache_tags values(:key, :tag); (NB use extended insert syntax in real impl).

When invalidating a tag, simply iterate over all keys that have that tag: (select key from cache_tags where tag=:tag;) and invalidate each of them (and optionally delete the key from the cache_tags table too to tidy up).

If a key is evicted from memcache then the cache_tags metadata will be out of date, but this is typically harmless. It will at most result in an inefficiency when invalidating a tag where you attempt to invalidate a key which had that tag but has already been evicted.

This approach gives "free" loading (no need to check tags) but expensive saving (which is already expensive anyway otherwise it wouldn't need to be cached in the first place!).

So depending on your use case and the expected load patterns and usage, I'd hope that either your original strategy (with more stringent checks on load) or a "database backed tag" strategy would fit your needs.

HTHs

Memcache tags simulation

Answers (1)

Related Questions