Reputation: 669
I have a task that is designed to run dozens of map/reduce jobs. Some of them are IO intensive, some are mapper intensive, some are reducer intensive. I would like to be able to monitor the number of mappers and reducers currently in use so that, when a set of mappers is freed up, I can push another mapper intensive job to the cluster. I don't want to just stack them up on the queue because they might clog up the mapper and not let the reducer-intensive ones run.
Is there a command line interface I can call to get this information from (for instance) a Python script?
Upvotes: 0
Views: 1869
Reputation: 669
I discovered that
mapred job -list
will list all of the jobs currently running, and
mapred job -status <job_id>
will provide the number of mappers and reducers for each job.
Upvotes: 0
Reputation: 4372
Hadoop Job status can be accessed by following ways.
Hadoop jobs can be administrated through the hadoop web UI.
Jobracker shows the jobs detail and default port is 50030 (localhost:50030 in pseudo mode
Tasktrackers shows the individual map/ reduce tasks and it is available at the default port 50060.
Hadoop provides a REST API to access the cluster, nodes, applications, and application historical information.
This REST API can be called from Python script also to get the application status. http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html
Upvotes: 2