Madhav Mishra
Madhav Mishra

Reputation: 11

How to run Hue Hive Queries sequentially

I have set up Cloudera Hue and have a cluster of master node of 200 Gib and 16 Gib RAM and 3 datnodes of each 150 Gib and 8 Gib Ram.

I have database of size 70 Gib approx. The problem is when I try to run Hive queries from hive editor(HUE GUI). If I submit 5 to 6 queries(for execution) Jobs are started but they hang and never run. How can I run the queries sequentially. I mean even though I can submit queries but the new query should only start when previous is completed. Is there any way so that I can make the queries run one by one?

Upvotes: 0

Views: 10287

Answers (3)

Amar
Amar

Reputation: 3845

You can run all your queries in one go and by separating them using ';' in HUE.

For example:

Query1; Query2; Query3

In this case query1, query2 and query3 will run sequentially one after another

Upvotes: 1

Madhav Mishra
Madhav Mishra

Reputation: 11

so the entire flow of YARN/MR2 is as follow

  1. query is submitted from HUE Hive query editor
  2. job is started and resource manager starts an application master on one of datanode
  3. this application master asks for the resources to resource manager(eg 2 * 1Gib/ 1 Core)
  4. resource manager provides these resources( called nodemanagers which then runs the map and reduce tasks) to application master.

so now resource allocation is handled by YARN.in case of cloudera cluster, Dynamic resource pools(kind of a queue) is the place where jobs are submitted and and then resource allocation is done by yarn for these jobs. by default the value of maximum concurrent jobs is set in such a way that resource manager allocates all the resource to all the jobs/Application masters leaving no space for task containers(which is required at later stage for running tasks by application masters.)

http://www.cloudera.com/content/cloudera/en/resources/library/recordedwebinar/introduction-to-yarn-and-mapreduce-2-slides.html

so if we submit large no of queries in HUE Hive editor for execution they will be submitted as jobs concurrently and application masters for them will be allocated resources leaving no space for task containers and thus all jobs will be in pending state.

Solution is as mentioned above by @Romain

set the value of max no of concurrent jobs accordingly to the size and capability of cluster. in my case it worked for the value of 4 now only 4 jobs will be run concurrently from the pool and they will be allocated resources by the resource manager.

Upvotes: 0

Romain
Romain

Reputation: 7082

Hue submits all the queries, if they hang, it means that you are probably hitting a misconfiguration in YARN, like gotcha #5 http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/

Upvotes: 0

Related Questions