Reputation: 418
I am planning to consider Redis for storing large amount of data in cache. Currently I store them in my own cache written in java. My use case is below.
I get 15 minutes data from a source and i need to aggregate the data hourly. So for a given object A every hour I will get 4 values and I need to aggregate them to one value the formula I will use will max / min / sum.
Foe making key I plan to use like below
a) object id - long
b) time - long
c) property id - int (each object may have many property which I need to aggregate for each property separately)
So final key would look like;
objectid_time_propertyid
Every 15 minutes I may get around 50 to 60 Million keys , I need to fetch these keys every time convert the property value to double and apply the formula (max/min/sum etc.) then convert back to String and store back. So I see for every key I have one read and one write and conversion in each case.
My questions are following.
If any one has real experience in usage of redis as memcache which requires frequent updation please give a suggestion.
Upvotes: 3
Views: 4435
Reputation: 49942
- Is is advisable to use redis for such use case , going forward I may aggregate hourly data to daily , daily to weekly and so on.
Advisable depends on who you ask, but I certainly feel Redis will be up to the job. If a single server isn't enough, your description suggests that the dataset can be easily sharded so a cluster will let you scale.
I would advise, however, that you store your data a little differently. First, every key in Redis has an overhead so the more of these, the more RAM you'll need. Therefore, instead of keeping a key per object-time-property, I recommend Hashes as a means for aggregating some values together. For example, you could use an object_id:timestamp
key and store the property_id:value pairs under it.
Furthermore, instead of keeping the 4 discrete measurements for each object-property by timestamp and recomputing your aggregates, I suggest you keep just the aggregates and update these with new measurements. So, you'd basically have an object_id
Hash, with the following structure:
object_id:hourtimestamp -> property_id1:max = x
property_id1:min = y
property id1:sum = z
When getting new data - d - for an object's property, just recompute the aggregates:
property_id1:max = max(x, d)
property_id1:min = min(y, d)
property_id1:sum = z + d
Repeat the same for every resolution needed, e.g. use object_id:daytimestamp
to keep day-level aggregates.
Finally, don't forget expiring your keys after they are no longer required (i.e. set a 24 hours TTL for the hourly counters and so forth).
There are other possible approaches, mainly using Sorted Sets, that can be applicable to solve your querying needs (remember that storing the data is the easy part - getting it back is usually harder ;)).
- What would be performance of read and writes in cache (I did a sample test on Windows and 100K keys read and write took 30-40 seconds thats not great , but I did on windows and I finally need to run on linux.
Redis, when running on my laptop on Linux in a VM, does an excess of 500K reads and writes per second. Performance is very dependent on how you use Redis' data types and API. Given your throughput of 60 million values over 15 minutes, or ~70K/sec writes of smallish data, Redis is more than equipped to handle that.
- I want to use persistence function of redis, what are pros and cons of it ?
This is an extremely-well documented subject - please refer to http://redis.io/topics/persistence and http://oldblog.antirez.com/post/redis-persistence-demystified.html for starters.
Upvotes: 3