plspl
plspl

Reputation: 728

Storing and query time interval data in redis

I have to cache program schedule data based on zipcode. Each zipcode can have between 8-20k program schedule entries for a day. Each program schedule entry would look like this,

program_name,
start_time,
end_time,
channel_no,
..
..

There can be upto 10k zipcode entries.

Now, I want to cache this in such a way so that I can query at any instant to get currently running programs. For a particular zipcode, I want to query based on condition below,

start_time < current_time + 2 minutes AND end_time > current_time

So, I was thinking of couple of approaches here.

a) Use a redis list for each zipcode. List would contain all the program schedule entries. Load all the program schedule entries in memory and filter them based on query condition above.

b) Use 2 sorted sets for each zipcode. One set will use start_time as score for each program schedule entry. Another one with end_time as score. Once we have 2 sets, I could use the zrangebyscore for both sets by passing the current_time for the score param. And then do the intersection between the resulting sets.

I was wondering if there are better ways?

Upvotes: 1

Views: 871

Answers (2)

plspl
plspl

Reputation: 728

I did solve this a bit differently a while back. Thought of coming back and adding my answer incase somebody runs into a similar design issue.

Problem was each of 10k zipcodes could have their own schedule because the channel numbers can be different based on zip code. So, the schedule entries for each of these zipcode is different. Here is what I did.

  • I load schedules for next hour for all channels in USA. There were about 25k channel numbers. I do this once a hour by loading the schedules from redis into local memory.
  • I also store the zipcode <-> channel mapping within local memory.
  • When I need schedules for a particular zipcode, I get the list of channels for that zipcode and then get the schedule entries matching the channel numbers. Because, I do this in local memory the performance was pretty good!

Upvotes: 0

Itamar Haber
Itamar Haber

Reputation: 49962

The List approach (a) is likely to be less performant since you'll need to get the entire list on every query.

Sorted Sets are more suitable for this purpose, but instead of using two you can probably get away with using just one by setting the score as start_time.length, do a ZRANGEBYSCORE and then filter the result on the fractional part.

Also, whether you're using two Sorted Sets or just one, consider using a Lua script to perform the query to avoid network traffic and to localize data processing.

Upvotes: 3

Related Questions