StackOverflowed
StackOverflowed

Reputation: 5975

Redis pipeline, dealing with cache misses

I'm trying to figure out the best way to implement Redis pipelining. We use redis as a cache on top of MySQL to store user data, product listings, etc. I'm using this as a starting point: https://joshtronic.com/2014/06/08/how-to-pipeline-with-phpredis/

My question is, assuming you have an array of ids properly sorted. You loop through the redis pipeline like this:

$redis = new Redis();

// Opens up the pipeline
$pipe = $redis->multi(Redis::PIPELINE);

// Loops through the data and performs actions
foreach ($users as $user_id => $username)
{
    // Increment the number of times the user record has been accessed
    $pipe->incr('accessed:' . $user_id);

    // Pulls the user record
    $pipe->get('user:' . $user_id);
}

// Executes all of the commands in one shot
$users = $pipe->exec();

What happens when $pipe->get('user:' . $user_id); is not available, because it hasn't been requested before or has been evicted by Redis, etc? Assuming it's result # 13 from 50, how do we a) find out that we weren't able to retrieve that object and b) keep the array of users properly sorted?

Thank you

Upvotes: 0

Views: 1999

Answers (1)

ovanes
ovanes

Reputation: 5673

I will answer the question referring to Redis protocol. How it works in particular language is more or less the same in that case.

First of all, let's check how Redis pipeline works: It is just a way to send multiple commands to server, execute them and get multiple replies. There is nothing special, you just get an array with replies for each command in the pipeline.

Why pipelines are much faster is because roundtrip time for each command is saved, i.e. for 100 commands there is only one round-trip time instead of 100. In addition, Redis executes every command synchronously. Executing 100 commands needs potentially fighting 100 times, for Redis to pick that singular command, pipeline is treated as one long command, thus requiring only once to wait being picked synchronously.

You can read more about pipelining here: https://redis.io/topics/pipelining. One more note, because each pipelined batch runs uninterruptible (in terms of Redis) it makes sense to send these commands in overviewable chunks, i.e. don't send 100k commands in a single pipeline, that might block Redis for a long period of time, split them into chunks of 1k or 10k commands.

In your case you run in the loop the following fragment:

// Increment the number of times the user record has been accessed
$pipe->incr('accessed:' . $user_id);

// Pulls the user record
$pipe->get('user:' . $user_id);

The question is what is put into pipeline? Let's say you'd update data for u1, u2, u3, u4 as user ids. Thus the pipeline with Redis commands will look like:

INCR accessed:u1
GET user:u1
INCR accessed:u2
GET user:u2
INCR accessed:u3
GET user:u3
INCR accessed:u4
GET user:u4

Let's say:

  • u1 was accessed 100 times before,
  • u2 was accessed 5 times before,
  • u3 was not accessed before and
  • u4 and accompanying data does not exist.

The result will be in that case an array of Redis replies having:

101
u1 string data stored at user:u1
6
u2 string data stored at user:u2
1
u3 string data stored at user:u3
1
NIL

As you can see, Redis will treat missing INCR values as being 0 and execute incr(0). Finally, there is nothing being sorted by Redis and the results will come in the oder as requested.

The language binding, e.g. Redis driver, will just parse for you that protocol and give the view to parsed data. Without keeping the oder of commands it'll be impossible for Redis driver to work correctly and for you as programmer to deduce smth. Just keep in mind, that request is not duplicated in the reply i.e. you will not receive key for u1 or u2 when doing GET, but just the data for that key. Thus your implementation must remember that on position 1 (zero based index) comes the result of GET for u1.

Upvotes: 2

Related Questions