shyamrag cp
shyamrag cp

Reputation: 102

Selecting the hive execution engine

Out of 3 hive execution engine shown below, which one is more recommended while working in Hadoop cluster. And what are the use-cases, when we have to use(Ideal choice).

I tried a query where the sample size is 400M, engine Tez has given me the output faster than other 2, summary of the query includes grouping and filtering.

set hive.execution.engine=spark;
set hive.execution.engine=tez;
set hive.execution.engine=mr;

I am trying to reach an answer, by seeing the query, should able to make a decision that particular engine will give results faster than others.

Upvotes: 1

Views: 4963

Answers (1)

Srini
Srini

Reputation: 883

The benefits that Tez provides over MapReduce execution engine while using Hive are:
● Tez does not write data to the disk during the intermediary steps of a Hive query. Tez makes use of
Directed Acyclic Graphs and the data from an intermediary step is passed on to the next step in the
graph instead of being written to the disk like it is done when using the MapReduce engine.
Removal of these IO operations saves a lot of time when dealing with large amounts of data.
● Tez and YARN together enable you to use objects in a container across applications. If two
applications require the same object(say a data frame) and are running within the same container,
you need not create the same object, again and again, you can reuse it. This leads to better
management of resources and also helps improve the performance.

Please check about spark engine here

https://community.cloudera.com/t5/Support-Questions/Hive-execution-engine-set-to-Spark-is-recommended/m-p/177906

If you want to run interactive queries, then LLAP(Live Long and Process) engine is suitable.

Upvotes: 2

Related Questions