makansij
makansij

Reputation: 9865

How do I limit the number of spark applications in state=RUNNING to 1 for a single queue in YARN?

I have multiple spark jobs. Normally I submit my spark jobs to yarn and I have an option that is --yarn_queue which tells it which yarn queue to enter.

But, the jobs seem to run in parallel in the same queue. Sometimes, the results of one spark job, are the inputs for the next spark job. How do I run my spark jobs sequentially rather than in parallel in the same queue?

I have looked at this page for a capacity scheduler. But the closest thing I can see is the property yarn.scheduler.capacity.<queue>.maximum-applications. But this only sets the number of applications that can be in both PENDING and RUNNING. I'm interested in setting the number of applications that can be in the RUNNING state, but I don't care the total number of applications in PENDING (or ACCEPTED which is the same thing).

How do I limit the number of applications in state=RUNNING to 1 for a single queue?

Upvotes: 3

Views: 2268

Answers (2)

rdeboo
rdeboo

Reputation: 377

From https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html:

The Fair Scheduler lets all apps run by default, but it is also possible to limit the number of running apps per user and per queue through the config file. This can be useful when a user must submit hundreds of apps at once, or in general to improve performance if running too many apps at once would cause too much intermediate data to be created or too much context-switching. Limiting the apps does not cause any subsequently submitted apps to fail, only to wait in the scheduler’s queue until some of the user’s earlier apps finish.

Specifically, you need to configure:

maxRunningApps: limit the number of apps from the queue to run at once

E.g.

<?xml version="1.0"?>
<allocations>
    <queue name="sample_queue">
      <maxRunningApps>1</maxRunningApps>
      <other options>
     </queue>
</allocations>

Upvotes: 1

FaigB
FaigB

Reputation: 2281

You can manage appropriate queue run one task a time in capacity scheduler configuration. My suggestion to use ambari for that purpose. If you haven't such opportunity apply instruction from guide

Upvotes: 1

Related Questions