CHAPa
CHAPa

Reputation: 677

is read operations expensive on Riak?

Watching the Basho Vimeo talk about Voxer, Matt told that "Read Operations are really expensive".

When a read occurs, the Riak use some kind of Quorum to choose which node will provide the data?

Riak is masterless, so every node should contain the same data, isn't? (Obviously after the inconsistency window that come from eventual consistency )

Thanks

Upvotes: 1

Views: 783

Answers (1)

Brian Roach
Brian Roach

Reputation: 76908

You're slightly not understanding what he was talking about; he's talking about spinning disks and the expense of reading from them. This is not a Riak specific issue.

They have a massive amount of data that can't fit in memory. It's even too big to easily use SSDs because they can't cram enough of them into a server at their current size limits (this is why they moved away from SSDs and back onto spinning disk as he states in his talk).

If you're not using an in-memory database (which Riak isn't, unless you're using the in-memory backend), as Matt states in that section of his talk you're simply limited by the number of iops your disk can give you if you have to read from disk. There's no way around that; you're reading from disk. He goes on to state that you want to cache everything you can to help with that.

That's pretty much how it works regardless of the database platform you're using when it comes to hitting the disks; there's no free-lunch :)

If you're using Riak and your dataset exceeds the amount of memory available, you're going to have to read from disk when there's a "cache miss". Riak relies on the underlying OS's disk cache if you're using the default Bitcask backend - other backends may or may not do this and instead do their own in-memory caching.

As for your question regarding data on the nodes ... Riak is masterless and originally based on the Amazon Dynamo paper. We use consistent hashing to distribute the data around the ring with replicas then being written to adjacent nodes, controlled by the "N value" you configure (and this is configurable on a per-bucket basis and even on a per-request basis). When you read this same hashing method is used to find which node the data "lives" on.

A read will read from (n_val/2) + 1 nodes by default, but you can tune this on a per-request basis to suit your needs. With eventual consistency there's no guarantee that the data on those nodes will be the same at the point in time you do your read, and you may need to perform conflict resolution depending on your business logic. With that being said understand that the amount of time something is inconsistent is measured in milliseconds under normal operations (e.g. you don't have a network partition or a node recovering from being down).

We have a ton of information about these things available on our website, and trying very hard to organize it so it's easily found. In particular you may want to look at Riak - clustering for how data is distributed.

Upvotes: 4

Related Questions