Pablote
Pablote

Reputation: 5083

Is it a good idea (How to) iterate a Redis sorted set?

Redis documentation and googling in general shows very little info on the matter which makes me think this might not be a great idea, or maybe there's some problems around it.

Basically, I've got some very large sorted sets with time series data (score is unix time). I need to query potentially large intervals of time, and do some post processing on the data. I want to evaluate the impact it would have on performance, on different load scenarios, to query the sorted set iteratively instead of a single request/response. This might be good because: It locks Redis for shorter periods of time (kinda like scan is better than keys), I can start doing the post-processing earlier and in parallel while data is still being retrieved, and I don't need to load the complete data set in memory before doing something with it, instead data can be discarded as it is processed.

Redis docs don't show examples on how to use LIMIT on ZRANGEBYSCORE, I can think two ways to go about:

1) Fixed range, variable LIMIT

ZRANGEBYSCORE my-sorted-set 1000000 2000000 LIMIT 0 10000
ZRANGEBYSCORE my-sorted-set 1000000 2000000 LIMIT 10000 10000
ZRANGEBYSCORE my-sorted-set 1000000 2000000 LIMIT 20000 10000
ZRANGEBYSCORE my-sorted-set 1000000 2000000 LIMIT 30000 10000

Same score ranges, but moving the offset

2) Variable range, fixed LIMIT

ZRANGEBYSCORE my-sorted-set 1000000 2000000 WITHSCORES LIMIT 0 10000
ZRANGEBYSCORE my-sorted-set 1010000 2000000 WITHSCORES LIMIT 0 10000
ZRANGEBYSCORE my-sorted-set 1020000 2000000 WITHSCORES LIMIT 0 10000
ZRANGEBYSCORE my-sorted-set 1030000 2000000 WITHSCORES LIMIT 0 10000

Here I'm adjusting the min with whatever the last iteration maximum score was. In both cases I would stop iterating when the result length is shorter than the COUNT.

Is any of these better than the other? Is there some gotcha I'm missing and any or either a bad idea?

thanks!

Upvotes: 3

Views: 1644

Answers (1)

Ofir Luzon
Ofir Luzon

Reputation: 10937

From the first option you wrote it seems that the actual scores are not important to you. That means your 2nd option is going to waste you some processing cycles and use unneeded network.

In any case, I suggest to experiment a little with the LIMIT you use. You can reduce the slowlog threshold and see the Redis CPU time for each invocation. (CONFIG SET slowlog-log-slower-than 0 will log all incoming requests. Default is 10000 microseconds).

BTW, you should check both STREAM datatype and Time Series module. It is very likely that at least one of them will give you better functionality.

Upvotes: 3

Related Questions