Reputation: 5870
My Hbase running on three machines,one for hmaster and the other two as regionServers,Now I'm thinking I have to do some replication work since this is a production enviroment for preventing machine crashing or power off.But I read some Hbase documents and couldn't find any way to replicate my data. The only way I'm using is set hadoop replication,I set hdfs-site.xml dfs.replicate=1. So is there any better ways to do the replication of Hbase for hot backup. Thanks in advance
Upvotes: 1
Views: 2392
Reputation: 11
for now , there are three concepts about replication.
the dfs.replication refers to the first concept. It guarantees that single failure of physical machine, your data is safe.
if the dfs.replication = 1, single point failure of you datanode(disk failure) may cause datablock corrupt, which leads to dataloss.
Upvotes: 0
Reputation: 53
Hbase tables are stored in HDFS in the form of blocks of data . Hadoop Distributed File System(HDFS) gives us the ability to decide what replication factor to be used for the blocks. Ideally it is kept at 3. This ensures that any given time 3 copies of the same data will be present in the nodes of the cluster and in the event of a failure of any node the same data will be available elsewhere to serve a query. This property is dfs.replication in hdfs-site.xml . Hbase also allows us to replicate the cluster state to another cluster. That is the entire data present in one Hbase cluster is copied to another cluster. The advantage of this is to manage disaster recovery.
Upvotes: 0
Reputation: 418
Be aware there are two different meaning of "replication" in your question:
Replication of HDFS blocks. Here replication
means "keeping multiple (redudant) copies of a block on different data nodes", which is how HDFS achieves high availability. You tell HDFS how many copies to keep by the "dfs.replication" settings property. Check the Data replication section of "HDFS Architecture Guide".
Replication betwee HBase Clusters. Here replication
means "send the updates of this cluster to another cluster so that the latter can serve as a backup". It can serve as a disaster recovery solution, which I guess is what you want. You need to setup another hbase cluster (called the slave or backup cluster), configure the replication. After that you can fail over to the backup cluster when the master cluster is down for some reason. Check this cloudera blog post and this section of hbase book for more details.
Upvotes: 2
Reputation: 506
You do not need replication factor for Hbase. Like mentioned before, Since Hbase stores data over HDFS, replication will be handled by HDFS. If a RegionServer goes down(in Hbase) HMaster will allocate the regions handled by the dead regionServer to a healthy regionserver or in case of dataNode failure(in HDFS), the HMaster will allocate the new dataBlocks from new dataNodes(provided by the NameNode) to a different regionserver which is healthy and working.
Upvotes: 1
Reputation: 409
HBase fully rely on HDFS replication.
All your data is present in HDFS and not in HBase(HBase internally store to HDFS). HBase is just a access mechanism of this data. Since you setup the dfs.replication=1 try taking datanode backup at regular intervals.
If you are worried about the region servers in HBase then this data is present in Zookeeper. Even if your Master/RS goes down and comes up it should get back to normal state.
If you are worried about the regions specifically, they are stored by HBase as catalog tables (ROOT, META). These are just like normal tables which will be present in HDFS.
So change the replication to >1(or default to 3) which is advised by the community.
Upvotes: 1
Reputation: 1811
HBase uses HDFS to store data, so by default you will have replication for your data in HDFS (by default you will have replication of 3 in HDFS). So you need not worry to have replication explicitly.
Upvotes: 1
Reputation: 109
In your production environment you need a replica of your data so that in case of a node failure or cluster failure your data will remain secure. If my understanding is correct then you can either go for
Upvotes: 2