Does HBase use the compute capacity of all nodes in the cluster for query execution?

Question

We are having a setup of 1 master and 2 slave nodes. The data is setup in postgres and in hbase and its a similar dataset (same number of rows) - 65 million rows. Yet, we dont find a measurable increase in performance from HBase for the same query.

My first thought is - does HBase use the compute capacity of all nodes to fork the query out? Perhaps this is why the performance is not measurably better.

Any other reasons for why the performance between Postgres and HBase would be about the same? Any specific configuration items to look for?

EDIT : Something I found while researching this : http://www.flurry.com/2012/06/12/137492485#.VaQP_5QpBpg

brandon.bell · Accepted Answer

This is kind of a yes and no answer. Depending on what you are doing for your 'query' and your region distribution, you may or may not be using all the nodes. For example, if you are running a scan across the table it will run against each region (assuming more then one) in sequence. However if you are using a multi-get for keys that are in different regions, this will run in parallel.

The real benefit is going to come as the number of regions increase and you start parallelizing requests (multiple clients). Regions will be distributed across region servers by the Master as regions are split.

Does HBase use the compute capacity of all nodes in the cluster for query execution?

Answers (1)

Related Questions