Jonathan
Jonathan

Reputation: 11355

Faster way to iterate all keys and values in redis db

I have a db with about 350,000 keys. Currently my code just loops through all keys and gets its value from the db.

However this takes almost 2 minutes to do, which seems really slow, redis-benchmark gave 100k reqs/3s.

I've looked at pipelining but I need each value returned so that I end up with a dict of key, value pairs.

At the moment I'm thinking of using threading in my code if possible to speed this up, is this the best way to handle this usecase?

Here's the code I have so far.

import redis, timeit
start_time = timeit.default_timer()
count = redis.Redis(host='127.0.0.1', port=6379, db=9)
keys = count.keys()

data = {}

for key in keys:
    value = count.get(key)
    if value:
        data[key.decode('utf-8')] = int(value.decode('utf-8'))

elapsed = timeit.default_timer() - start_time

print('Time to read {} records: '.format(len(keys)), elapsed)

Upvotes: 7

Views: 7342

Answers (2)

Kees C. Bakker
Kees C. Bakker

Reputation: 33381

I had the same problem and ended up usingKEYS and MGET to iterate multiple keys at the same time:

import redis
url='redis://my.redis.url'
query='product:*'

client = redis.StrictRedis.from_url(url, decode_responses=True)
keys = client.keys(query)

def chunks(lst, n):
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

partitions = list(chunks(keys, 10000))

data = []
for keys in partitions:
    values = client.mget(keys)
    data.extend(zip(keys, values))

print(len(data))

I've written a blog on showing progress while writing the result to a file.

This code is the base for the redis-mass-get Python package. It could be used to do the same, like this:

from redis_mass_get import RedisQuery

# pluralize will return the result or None
q = RedisQuery("redis://my.amazing.redis.url")

# query data 
data = q.query("product:*")
# data is returned as:
# [(key1, value1), (key2, value2)]

Upvotes: 4

Imaskar
Imaskar

Reputation: 2939

First, the fastest way is doing all of this inside EVAL.

Next, recommended approach to iterate all keys is SCAN. It would not iterate faster than KEYS, but will allow Redis to process some other actions in between, so it will help with overall application behavior.

The script will be something like local data={} local i=1 local mykeys=redis.call(\"KEYS\",\"*\") for k=1,#mykeys do local tmpkey=mykeys[k] data[i]={tmpkey,redis.call(\"GET\",tmpkey)} i=i+1 end return data, but it will fail if you have keys inaccessible with GET (like sets, lists). You need to add error handling to it. If you need sorting, you can do it either in LUA directly, or later on the client side. The second will be slower, but would not let other users of redis instance wait.

Sample output:

127.0.0.1:6370> eval "local data={} local i=1 local mykeys=redis.call(\"KEYS\",\"*\") for k=1,#mykeys do local tmpkey=mykeys[k] data[i]={tmpkey,redis.call(\"GET\",tmpkey)} i=i+1 end return data" 0
1) 1) "a"
   2) "aval"
2) 1) "b"
   2) "bval"
3) 1) "c"
   2) "cval"
4) 1) "d"
   2) "dval"
5) 1) "e"
   2) "eval"
6) 1) "f"
   2) "fval"
7) 1) "g"
   2) "gval"
8) 1) "h"
   2) "hval"

Upvotes: 4

Related Questions