Reputation: 1850
I use Apache Hive 2.1.1-cdh6.2.1 (Cloudera distribution) with MR as execution engine and YARN's Resource Manager using Capacity scheduler.
I'd like to try Spark as an execution engine for Hive. While going through the docs, I found a strange limitation:
Instead of the capacity scheduler, the fair scheduler is required. This fairly distributes an equal share of resources for jobs in the YARN cluster.
Having all the queues set up properly, that's very undesirable for me.
Is it possible to run Hive on Spark with YARN capacity scheduler? If not, why?
Upvotes: 5
Views: 897
Reputation: 605
We are running it at work using the command on Beeline as described https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started just writing it at the beginning of the sql file to run
set hive.execution.engine=spark;
select ... from table....
We are not using capacity scheduler because there are hundreds of jobs run per yarn queue, and when jobs are resource avid, we have other queues to let them run. That also allows designing a configuration based on job consumption per queue more realistic based on the actual need of the group of jobs
Hope this helps
Upvotes: 0
Reputation: 1743
I'm not sure you can execute Hive using spark Engines. I highly recommend you configure Hive to use Tez https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez which is faster than MR and it's pretty similar to Spark due to it uses DAG as the task execution engine.
Upvotes: 0