Follow up Q on [Segmenting Redis By Database]

Question

This is a follow up question Segmenting Redis By Database. I originally asked about the time complexity of the redis keys operation in different databases within one redis instance. The reason I was asking is because I am attempting to implement a cache where there are x multi-segment keys, each of which may have y actual data instances, resulting in xy total keys.

However, I would like to support the wild-card search of the primary keys and it seems that in redis the only implemented wild-card query for keys is the keys command, the use of which is discouraged. It seemed to me to be a decent compromise to put the x keys in a separate database where the lower number of keys would make the keys operation perform satisfactorily.

Can anyone suggest a better alternative ?

Thanks.

Didier Spezia · Accepted Answer

I still think using KEYS is really not scalable with Redis, whatever clever scheme you can put in place to work the linear complexity around.

Partitioning is one of this scheme, and it is commonly used in traditional RDBMS to reduce the cost of table scans on flat tables. Your idea is actually an adaptation of this concept to Redis.

But there is an important difference compared to traditional RDBMS providing this facility (Oracle, MySQL, ...): Redis is a single-threaded event loop. So a scan cannot be done concurrently with any other activity (like serving other client connections for instance). When Redis scans data, it is blocked for all connections.

You would have to setup a huge number of partitions (i.e. of databases) to get good performance. Something like 1/1000 or 1/10000 of the global number of keys. And this is why it is not scalable: Redis is not designed to handle such a number of databases. You will likely have issues with internal mechanisms iterating on all the databases. Here is a list extracted from the source code:

automatic rehashing
item expiration management
database status logging (every 5 secs)
INFO command
maxmemory management

You would likely have to limit the number of databases, which also limits the scalability. If you set 1000 databases, it will be work fine for say 1M items, will be slower for 10M items, and unusable with 100M items.

If you still want to stick to linear scans to implement this facility, you will be better served by other stores supporting concurrent scans (like MySQL, MongoDB, etc ...). With the other stores, the critical point will be to implement item expiration in an efficient way.

If you really have to use Redis, you can easily segment the data without relying on multiple databases. For instance, you could use the method I have described here. With this strategy, the list of keys is retrieved in an incremental way, and the search is actually done on client-side. The main benefit is you can have a large number of partitions, so that Redis would not block.

Now, AFAIK no storage engine provides the capability to efficiently search data with an arbitrary regular expression (i.e. avoiding a linear scan). However, this feature is provided by some search engines, typically using n-gram indexing.

Here is a good article about it from Russ Cox: http://swtch.com/~rsc/regexp/regexp4.html

This indexing mechanism could probably be adapted to Redis (you would use Redis to store a trigram index of your keys), but it represents a lot of code to write.

You could also imagine restricting the regular expressions to prefix searches. For instance U:SMITH:(.*) is actually a search with prefix U:SMITH:

In that case, you can use a zset to index your keys, and perform the linear search on client side once the range of keys you are interested in has been retrieved. The score of the items in the zset is calculated from the keys on client-side, so that the score order corresponds to the lexicographic order of the keys.

With such zset, it is possible to retrieve the range of keys you have to scan chunk by chunk by a combination of zscore and zrange commands. The consequences are the number of keys to scan is limited (by the prefix), the search occurs on client-side, and it is friendly with Redis concurrency model. The drawbacks are the complexity (especially to handle item expiration), and the network bandwidth consumption.

Follow up Q on [Segmenting Redis By Database]

Answers (1)

Related Questions