Reputation: 1619
I'm concerned by this note in Riak's documentation:
N=3 simply means that three copies of each piece of data will be stored in the cluster. That is, three different partitions/vnodes will receive copies of the data. There are no guarantees that the three replicas will go to three separate physical nodes; however, the built-in functions for determining where replicas go attempts to distribute the data evenly.
https://docs.basho.com/riak/kv/2.1.3/learn/concepts/replication/#so-what-does-n-3-really-mean
I have a cluster of 6 physical servers with N=3. I want to be 100% sure that total loss of some number of nodes (1 or 2) will not lose any data. As I understand the caveat above, Riak cannot guarantee that. It appears that there is some (admittedly low) portion of my data that could have all 3 copies stored on the same physical server.
In practice, this means that for a sufficiently large data set I'm guaranteed to completely lose records if I have a catastrophic failure on a single node (gremlins eat/degauss the drive or something).
Is there a Riak configuration that avoids this concern?
Unfortunate confounding reality: I'm on an old version of Riak (1.4.12).
Upvotes: 2
Views: 163
Reputation: 1001
There is no configuration that avoids the minuscule possibility that a partition might have 2 or more copies on one physical node (although having 5+ nodes in your cluster makes it extremely unlikely that a single node with have more than 2 copies of a partition). With your 6 node cluster it is extremely unlikely that you would have 3 copies of a partition on one physical node.
The riak-admin command line tool can help you explore your partitions/vnodes. Running riak-admin vnode-status
(http://docs.basho.com/riak/kv/2.1.4/using/admin/riak-admin/#vnode-status) on each node for example will output the status of all vnodes the are running on the local node the command is run on. If you run it on every node in your cluster you confirm whether or not your data is distributed in a satisfactory way.
Upvotes: 1