Manu Mukerji
Manu Mukerji

Reputation: 255

Hive over HBase vs Hive over HDFS

My data does not need to be loaded in realtime so I don't have to use HBASE, but I was wondering if there are any performance benefits of using HBASE in MR Jobs, shouldn't the joins be faster due to the indexed data?

Anybody have any benchmarks?

Upvotes: 2

Views: 3837

Answers (3)

Wes Floyd
Wes Floyd

Reputation: 356

Performance of HBase vs. Hive:

Based on the results of HBase, Hive, and Hive on Hbase: it appears that the performance between either approach is comparable.

Hive on HBase Performance

enter image description here

Upvotes: 2

khan
khan

Reputation: 2674

Respectfully :) I want to tell you that if your data is not real and you are also thinking for mapreduce jobs then only go hive over hdfs as Weblogs can be processed by the Hadoop MapReduce program and stored in HDFS. Meanwhile, Hive supports fast reading of the data in the HDFS location, basic SQL, joins, and batch data load to the Hive database.
As hive also provide us
Bulk processing/ real time(if possible)
as well as SQL like interface
Built in optimized map-reduce
Partitioning of large data which is more compatible with hdfs and help to reduce the layer of HBase otherwise if you add HBase here then it would be redundant features for you :)

Upvotes: 0

Paul M
Paul M

Reputation: 2046

Generally speaking, hive/hdfs will be significantly faster than HBase. HBase sits on top of HDFS so it adds another layer. HBase would be faster if you are looking up individual records but you wouldn't use an MR job for that.

Upvotes: 2

Related Questions