mkorszun
mkorszun

Reputation: 4571

What is the most efficient way to store time series in Riak with heavy reads

My current approach:

My concern is efficiency of queries made over specific time window for given application. Currently to get time series from some specific time window and eventually make some reductions I have to make map/reduce over whole "time_metric/APPLICATION_KEY" bucket, which what I have found is not the recommended use case for Riak Map/Reduce.

My question: what would be the best db structure for this kind of a system and how efficiently query it.

Upvotes: 3

Views: 2463

Answers (2)

Alex Moore
Alex Moore

Reputation: 3455

Adding onto @macintux's answer.

Basho has had a few customers that have used riak for time series metrics. Boundary has a nice tech talk about how they use Riak with their network monitoring software. They rollup data into different chunks of time (1m, 5m, 15m) for analysis. They also have a series of blog posts about lessons learned while implementing this system.

Kivra also has a good slide deck about how they use timeseries data with riak.

You could roll up your data into some sort of arbitrary time length, then read the range you need by issuing regular K/V gets, and then reconstruct the larger picture / reduce in your application.

Upvotes: 4

macintux
macintux

Reputation: 894

If you have spare computing power and you know in advance what keys you need, you certainly can use Riak's MapReduce, but often retrieving the keys and running your processing on the client will be as fast (and won't strain your cluster).

Some general ideas:

  • Roll up your data into larger blocks
    • If you're concerned about losing data if your client crashes while buffering it, you can always store the data as it arrives
    • Similar idea: store the data as it arrives, then retrieve it and roll it up at certain intervals
      • You can automatically expire data once you're confident it is being reliably stored in larger blocks, using either the Bitcask or Memory backends
      • Memory backend is quite useful (RAM permitting) for any data that only needs to be stored for a limited period of time
  • Related: don't be afraid to store multiple copies of your data to make reading/reporting easier later
    • Multiple chunks of time (5- and 15-minute blocks, for example)
    • Multiple report formats

Having said all that, if you're doing straight key/value requests (it's ideal to always be able to compute the keys you need, rather than doing indexing or searching), Riak can support very heavy traffic loads, so I wouldn't recommend spending too much time creating alternative storage mechanisms unless you know you're going to face latency problems.

Upvotes: 3

Related Questions