Reputation: 237

Couchbase query on multiserver storage

I am working with couchbase. I see several couchbase servers running with one as master and rest as replica server for a particular read/write request. Does this mean the complete data of the database is copied on all the server? Let's say there are 10 server, does that mean there will be 10 copies of the database on 10 different servers? Is this not inefficient use of storage space?

During failover, there will only update in the vBucket Map, no transfer of data to failover server to other server as rest of the server already contain complete data of the database. Is my understanding correct?

I read the document available on couchbase website but not able to completely understand the answer to above questions.

Can anyone help me to get the answer to above questions.

Thanks in advance

Upvotes: 1

Answers (1)

Adam Taylor

Reputation: 460

Trond Norbye has an excellent explanation of vBuckets and replication on his blog.

To address your questions directly:

The way that Couchbase distributes data throughout the cluster is via the concept of vBuckets. These can be thought of as 'shards' or 'partitions' of your data. The default amount of vBuckets in a cluster is 1024, so your data will be split into 1024 parts and these are shared equally to every node in the cluster. Therefore, in your example of a cluster with 10 nodes each node will be responsible for just over 100 vBuckets of data. The replication system also uses vBuckets and distributes the same vBuckets but to different nodes in the cluster. So the active and replica vBuckets will always be on separate nodes. If the node with the active vBucket failsover, the replica node will seamlessly begin serving traffic for that vBucket.

In the above blog post, Trond Norbye has posted a handy table to visualise this:

+------------+---------+---------+---------+
| vbucket id | active  | replica | replica2|
+------------+---------+---------+---------+
|     0      | node A  | node B  | node D  |
|     1      | node B  | node C  | node A  |
|     2      | node C  | node D  | node B  |
|     3      | node D  | node A  | node C  |
+------------+---------+---------+---------+

So if you specify a single replica for your data, your data will be stored twice in Couchbase, 2 replicas will store three copies of the data in the cluster. So no wasted storage space. :)

You are correct about the failover situation, as there are already replica vBuckets ready to take over the traffic there is no need for data to be transferred between nodes. However, you will now have one node in the cluster serving traffic for more vBuckets than it was originally responsible for, so the cluster will be imbalanced. To resolve this you should either bring the failed node back up or complete a rebalance.

In addition to the the architecture overview documentation there are also some good introductory videos on the Couchbase YouTube channel, this one in particular provides a good overview on the basics of Couchbase. The architecture white paper is also good.

Upvotes: 2

Couchbase query on multiserver storage

Answers (1)

Related Questions