What is the difference between Map Reduce and Spark about engine in Hive?

Question

It looks like there are two ways to use spark as the backend engine for Hive.

The first one is directly using spark as the engine. Like this tutorial.

Another way is to use spark as the backend engine for MapReduce. Like this tutorial.

In the first tutorial, the hive.execution.engine is spark. And I cannot see hdfs involved.

In the second tutorial, the hive.execution.engine is still mr, but as there is no hadoop process, it looks like the backend of mr is spark.

Honestly, I'm a little bit confused about this. I guess the first one is recommended as mr has been deprecated. But where is the hdfs involved?

Ged · Accepted Answer

I understood it differently.

Normally Hive uses MR as execution engine, unless you use IMPALA, but not all distros have this.

But for a period now Spark can be used as execution engine for Spark.

Answers (2)