Reputation: 3744
I started to learn hbase and I don't understand how it scales linearly.
The problem is that before you install hbase you have to have an hdfs cluster. The HDFS cluster have a master node which can be only one in the whole cluster, so it is a bottleneck. Ofcourse we can run 1 more master node (it is possible to run only 1 more master node) but it will be in the standby state. As I understand hbase uses the HDFS cluster to store data. So, for me it is logically that it have no sense to run more than one Hmaster because all requests will go to the hdfs active master which performance can suffer if we have too much requests.
Also I don't understand properly do we need to install hbase on the same nodes with hdfs or separately. What are the benefits if we run hbase separately from HDFS. As for me it is logically to install hbase cluster on the same nodes with hdfs as in the following example:
HDFS active master - HMaster
HDFS standby master - HMaster backup
HDFS Data node - HRegion server
for me it is the most logically structure because if we separate hdfs master from hmaster then probability to loose hbase cluster will be two times bigger.
I will be very happy if someone can share information about all these stuff. Because I really don't understand how hbase can linearly scales and how it works with hdfs.
Upvotes: 2
Views: 1228
Reputation: 1006
First if you want you can install HBase over any supported file system. It is not mandatory to use it over Hdfs but using it with Hdfs give advantage to it like Fault taulrence , Data replication, checksums etc. That's why it is recommended to use HBase over hdfs
Moreover although there is a bottleneck of namenode in hdfs but it does not effect HBase efficiency because it is not that every operation internal working is dependent on namenode of hdfs for instance Region servers serve data for reads and writes. When accessing data, clients communicate with HBase RegionServers directly while Region assignment, DDL (create, delete tables) operations are handled by the HBase Master process. Which means that reading and writing of data is independent of creating and deleting of table.
You can refer https://www.mapr.com/blog/in-depth-look-hbase-architecture for more details about hdfs.
Also see this webinar on HBase by lars george. https://m.youtube.com/watch?v=_HLoH_PgrLk
Hope this will clear your doubts.
Upvotes: 3