sof2er
sof2er

Reputation: 61

Query Redis timeseries by period using zrange

I've searched and did a fair bit of reading, however could not find an answer. I believe I may have the wrong idea of caching implementation here.

I have a set of historical and current financial stock data that I want to keep cached for fast API access, these data follow the format of:

id: uuid
timestamp: unix_timestamp
data: {...}

The intent is to allow the data to be requested by the user which is served by the cache, however I want the user to be able to provide a time range such as 1483142400 - 1493510400 along with a period in the form of seconds (such as 300, 900, 1800, 3600, 86400).

I'm now in a dilemma on how to deal with this, my initial idea was to store all the timestamps using zadd in a single sorted set per stock, and then query this using zrange. However that would be expensive as I'd be querying too much unnecessary data unless Redis is able to 'step' through at a period (if I can provide all the keys as they follow a consistent format that is not random).

My second idea is to instead use separate keys, given my financial data is for stock XYZ:

ZADD XYZ_300 1501200300 'data' 1501200600 'data'
ZADD XYZ_900 1501200900 'data' 1501201800 'data'
ZADD XYZ_1800 1501201800 'data' 1501203600 'data'
ZADD XYZ_3600 1501203600 'data' 1501207200 'data'

This however means I have some redundant data going on as all the sets will have data that is also available in the 300 seconds table.

Moreover, I may have a few stocks such as XYZ, ABC and DEF. I want to be able to query the data from 3 days ago up to 2 days ago in an hourly period for all of the above stocks, I'm not really sure if I should be using a ZUNIONSTORE or some kind of pipeline to do this efficiently to query multiple ranges in one go.

The only guarantee right now is that I know exactly what my keys are as they're always rounded to the nearest 00:00, 05:00, 15:00 and 30:00 minutes unix timestamp. The cache itself is being set/queried from Python running Django (storing json). It might be that I'm better off storing this in files instead of a cache (as I'm using AWS I can utilize the new EFS storage system for multiple servers to share the same underlying 'harddisk/volume')

Upvotes: 1

Views: 2849

Answers (2)

Danni
Danni

Reputation: 521

From what I understand you have 2 basic requirements: 1. To be able to query on any random time range 2. Query result aggregation by time-span

ZRANGE is basically a sorted set with a (pretty) efficient query run-time. From the Redis manual:

O(log(N)+M) with N being the number of elements in the sorted set and M the number of elements returned.

A common usecase for a many people is to keep time-series data in a sorted set:

  • Per timestamp add a data key (Hash type) with the values and their identity (stock for example).
  • Each ZSET item data must be unique, so you must keep the data in a separate hash key
  • Query the specific stock by the key and get a specific time with ZRANGE which has pretty good performance. A good article about this technic can be found here: https://www.infoq.com/articles/redis-time-series.

Regarding Performance we need to look at both the insert and read:

  1. Write performance: ZADD is O(log(N)) + Hash write performance O(N) wheres N is the number of data keys (stocks for example).
  2. Read performance: ZRANGE O(log(N)+M) + HGET for a specific key O(1) If you have a lot of samples (M) your read performance will be reduced.

Regarding to requirement #2: There is no easy way to do aggregations in redis.

To sum up, there is no optimal data structure for time series, you can use ZSET (a lot of people do) it will work but has is not very optimal and also not space efficient.

You can improve by:

  • Use LUA code to query the raw sorted-set and do some magic without moving a lot of data to the client.
  • Write your own data structure using the new modules API.

Or you can try and use the module I wrote and maintain: https://github.com/danni-m/redis-timeseries. The features are:

  • Efficient data store for time-series (currently each sample is 16 bytes, im working on a gorilla compression which will lower the space)
  • Aggregated queries by time-buckets (requirement #1)
  • Automatic compations/downsampling (one-to-many) using different aggregation method (max, min, avg, sum)
  • very simple commands for adding and querying time series.

Upvotes: 5

Yatender Singh
Yatender Singh

Reputation: 3322

you need to have table_name, data, timestamp_in_unix

to add

zadd table_name data timestamp_in_unix

to retrieve all values

zrangebyscore table_name -inf +inf 

to retrieve in a particular range

zrangebyscore table_name from_unix_timestamp to_unix_timestamp

I hope this helps. let me know if you face any problem.

Upvotes: 0

Related Questions