Sraw
Sraw

Reputation: 20264

What is the difference between Map Reduce and Spark about engine in Hive?

It looks like there are two ways to use spark as the backend engine for Hive.

The first one is directly using spark as the engine. Like this tutorial.

Another way is to use spark as the backend engine for MapReduce. Like this tutorial.

In the first tutorial, the hive.execution.engine is spark. And I cannot see hdfs involved.

In the second tutorial, the hive.execution.engine is still mr, but as there is no hadoop process, it looks like the backend of mr is spark.

Honestly, I'm a little bit confused about this. I guess the first one is recommended as mr has been deprecated. But where is the hdfs involved?

Upvotes: 0

Views: 630

Answers (2)

Lovish saini
Lovish saini

Reputation: 117

Apache Spark builds DAG(Directed acyclic graph) whereas Mapreduce goes with native Map and Reduce. While execution in Spark, logical dependencies form physical dependencies.

Now what is DAG?

DAG is building logical dependencies before execution.(Think of it as a visual graph) When we have multiple map and reduce or output of one reduce is the input to another map then DAG will help to speed up the jobs. enter image description here DAG is build in Tez (right side of photo) but not in MapReduce (left side).

NOTE: Apache Spark works on DAG but have stages in place of Map/Reduce. Tez have DAG and works on Map/Reduce. In order to make it simpler i used Map/Reduce context but remember Apache Spark have stages. But the concept of DAG remains the same.

Reason 2: Map persists its output to disk.(buffer too but when 90% of it is filled then output goes into disk) From there data goes to merge. But in Apache Spark intermediate data is persist to memory which makes it faster. Check this link for details

Upvotes: 1

Ged
Ged

Reputation: 18108

I understood it differently.

Normally Hive uses MR as execution engine, unless you use IMPALA, but not all distros have this.

But for a period now Spark can be used as execution engine for Spark.

https://blog.cloudera.com/blog/2014/07/apache-hive-on-apache-spark-motivations-and-design-principles/ discusses this in more detail.

Upvotes: 1

Related Questions