Reputation: 129
I am new to Hadoop
and I recently installed Hive
and HBase
.
I created few tables in Hive and the queries are running in MapReduce
fashion. Also, when I say 'get' in HBase, it is not running in MapReduce and that is why I believe it is of high latency.
I have few quick doubts here.
If I have an application where I need to process real time streaming data, what shall I use - Hive
or HBase
?
Can I install HBase
with MapReduce
option, such that the get
command in HBase
runs in MapReduce
fashion
Thanks in advance
Upvotes: 0
Views: 112
Reputation: 34184
If I have an application where I need to process real time streaming data, what shall I use - Hive or HBase?
Hive is best suited for batch processing kind of needs. I would never prefer it for my real time needs. As you have noticed, when you issue a Hive query it first gets converted into a MapReduce job and then gives you the result. This involves some latency.
But the real question here is how to process real time streaming data. Both HBase and Hive are systems which allow us to store data on top of an existing Hadoop cluster. Of course you can process your data at a later stage by writing programs using HBase API/Hive queries. But that wouldn't be real time processing of your streaming data, IMHO.
When you say processing of streaming data it implies that you intend to process your data on the fly as it comes, without having to store it(Although you can store it simultaneously). Tools like STORM are meant for that. Do have a look at it.
Can I install HBase with MapReduce option, such that the get command in HBase runs in MapReduce fashion?
Both HBase and MapReduce are 2 different things. Operations like get and scan are HBase specific and they can't be used as a MapReduce job(unlike Hive queries). But you can definitely use HBase with MapReduce to get/put data from/to your HBase tables. See this for more details.
HTH
Upvotes: 0
Reputation: 1401
HBase is database which and it doesn't have option to run mapreduce for its operations like get, scan, put ...
If you want to process data from HBase in mapreduce style You need to create custom map reduce job, or use some other analytics tool like Hive, Pig, ...
Hive is data warehouse platform built on top of Hadoop mapreduce. It can read data from many different sources like, HDFS file, S3 file, HBase, etc ...
Hope this is useful for You.
Upvotes: 1