Reputation: 61
I've searched and did a fair bit of reading, however could not find an answer. I believe I may have the wrong idea of caching implementation here.
I have a set of historical and current financial stock data that I want to keep cached for fast API access, these data follow the format of:
id: uuid
timestamp: unix_timestamp
data: {...}
The intent is to allow the data to be requested by the user which is served by the cache, however I want the user to be able to provide a time range such as 1483142400 - 1493510400
along with a period in the form of seconds (such as 300, 900, 1800, 3600, 86400).
I'm now in a dilemma on how to deal with this, my initial idea was to store all the timestamps using zadd
in a single sorted set per stock, and then query this using zrange
. However that would be expensive as I'd be querying too much unnecessary data unless Redis is able to 'step' through at a period (if I can provide all the keys as they follow a consistent format that is not random).
My second idea is to instead use separate keys, given my financial data is for stock XYZ:
ZADD XYZ_300 1501200300 'data' 1501200600 'data'
ZADD XYZ_900 1501200900 'data' 1501201800 'data'
ZADD XYZ_1800 1501201800 'data' 1501203600 'data'
ZADD XYZ_3600 1501203600 'data' 1501207200 'data'
This however means I have some redundant data going on as all the sets will have data that is also available in the 300 seconds table.
Moreover, I may have a few stocks such as XYZ
, ABC
and DEF
. I want to be able to query the data from 3 days ago up to 2 days ago in an hourly period for all of the above stocks, I'm not really sure if I should be using a ZUNIONSTORE
or some kind of pipeline to do this efficiently to query multiple ranges in one go.
The only guarantee right now is that I know exactly what my keys are as they're always rounded to the nearest 00:00, 05:00, 15:00 and 30:00 minutes unix timestamp. The cache itself is being set/queried from Python running Django (storing json). It might be that I'm better off storing this in files instead of a cache (as I'm using AWS I can utilize the new EFS storage system for multiple servers to share the same underlying 'harddisk/volume')
Upvotes: 1
Views: 2849
Reputation: 521
From what I understand you have 2 basic requirements: 1. To be able to query on any random time range 2. Query result aggregation by time-span
ZRANGE is basically a sorted set with a (pretty) efficient query run-time. From the Redis manual:
O(log(N)+M) with N being the number of elements in the sorted set and M the number of elements returned.
A common usecase for a many people is to keep time-series data in a sorted set:
Regarding Performance we need to look at both the insert and read:
Regarding to requirement #2: There is no easy way to do aggregations in redis.
To sum up, there is no optimal data structure for time series, you can use ZSET (a lot of people do) it will work but has is not very optimal and also not space efficient.
You can improve by:
Or you can try and use the module I wrote and maintain: https://github.com/danni-m/redis-timeseries. The features are:
Upvotes: 5
Reputation: 3322
you need to have table_name, data, timestamp_in_unix
to add
zadd table_name data timestamp_in_unix
to retrieve all values
zrangebyscore table_name -inf +inf
to retrieve in a particular range
zrangebyscore table_name from_unix_timestamp to_unix_timestamp
I hope this helps. let me know if you face any problem.
Upvotes: 0