saurabh shashank
saurabh shashank

Reputation: 1353

Performance comparison : Hive & MapReduce

Hive provides an abstraction layer over java Map Reduce job , so it should have performance issue when compared to Java Map Reduce Jobs.

Do we have any benchmark to compare the performance of Hive Query & Java Map Reduce Jobs ? 

Actual use-cases scenario with run time data , would be real help .

Thanks

Upvotes: 0

Views: 2748

Answers (3)

Prabhat Jain
Prabhat Jain

Reputation: 331

  1. If you've small dataset on your machine and want to process using Apache Hive, execution of Job on small dataset would be slow as compared to process the same dataset using Hadoop MapReduce. Performance of hive slightly degrades, if you consider small datasets. Whereas, for large datasets, Apache Hive performace would be better as compared to MapReduce.

  2. While processing datasets in MapReduce, data-set is stored in HDFS. MapReduce has no database of its own, as Hive has meta-store. From Hive's Metastore, data can be shared with Impala, Beeline, JDBC and ODBC drivers.

Upvotes: 0

Arnon Rotem-Gal-Oz
Arnon Rotem-Gal-Oz

Reputation: 25909

Your premise that " so it should have performance issue when compared to Java Map Reduce Jobs." is wrong......

Hive (and Pig and crunch and other map/reduce abstractions) would be slower than a fully tuned hand written map/reduce.

However, unless you're experienced with the Hadoop and map/reduce, the chances are, that the map/reduce you'd write would be slower on non-trivial queries vs. what Hive et. al. will do

Upvotes: 6

Pieterjan
Pieterjan

Reputation: 617

I did some small test in a VM some time back and I couldn't really notice any difference. Maybe Hive was a few seconds slower sometimes but I can't really tell if that was Hives performance or my VM that was hanging due to low memory. I think that one thing to keep in mind is, Hive will always determine the fastest way to do a MapReduce job. Now, when you write small MapReduce jobs, you'll probably be able to find the fastest way yourself. But with large complex jobs (with joins, etc.) will you always be able to compete with Hive?

Also, the time you need to write a MapReduce job of multiple classes and methods seems to take ages in comparison with writing a HiveQL query.

On the other hand, I had the feeling that when I wrote the job myself it was easier to know what was going on.

Upvotes: 1

Related Questions