Robert Rapplean
Robert Rapplean

Reputation: 669

How can I tell how many mappers and reducers are running?

I have a task that is designed to run dozens of map/reduce jobs. Some of them are IO intensive, some are mapper intensive, some are reducer intensive. I would like to be able to monitor the number of mappers and reducers currently in use so that, when a set of mappers is freed up, I can push another mapper intensive job to the cluster. I don't want to just stack them up on the queue because they might clog up the mapper and not let the reducer-intensive ones run.

Is there a command line interface I can call to get this information from (for instance) a Python script?

Upvotes: 0

Views: 1869

Answers (2)

Robert Rapplean
Robert Rapplean

Reputation: 669

I discovered that

mapred job -list

will list all of the jobs currently running, and

mapred job -status <job_id>

will provide the number of mappers and reducers for each job.

Upvotes: 0

Vijay Innamuri
Vijay Innamuri

Reputation: 4372

Hadoop Job status can be accessed by following ways.

  • Hadoop jobs can be administrated through the hadoop web UI.

    Jobracker shows the jobs detail and default port is 50030 (localhost:50030 in pseudo mode

    Tasktrackers shows the individual map/ reduce tasks and it is available at the default port 50060.

  • Hadoop provides a REST API to access the cluster, nodes, applications, and application historical information.

    This REST API can be called from Python script also to get the application status. http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html

Upvotes: 2

Related Questions