Reputation: 793
I mean , 2 options : 1. Install HBase on Hadoop cluster which is also do offline computing, so means only 1 hadoop cluster. 2. Install a Hadoop Cluster for Offline Computing , then install another Hadoop Cluster only for HBase to use the its HDFS.
So the 2 options are : one is an integrated Cluster , another is actually 2 clusters.
What's the pros & cons for these 2 options ?
Upvotes: 1
Views: 512
Reputation: 20826
Option 1: An integrated cluster.
Pros: MapReduce which reads or writes HBase will more efficient as the data locality.
Cons: The HBase region server will reduce the performance of the machine (Datanode and TaskTracker) as it need to hold some CPU and memory. The HBase latency may be seconds if there are many MapReduce jobs. So if you want to make HBase response in time, you need more work (For example, using memcache to improve the read performance).
Option 2: 2 clusters.
Prons: The HBase region server will not impact the performance of the HDFS Datenode and the TaskTracker.
Cons: MapReduce needs to read and write the data remotely if it wants to access HBase. The option also needs more machines.
Upvotes: 1