Abhijeet Sachdev
Abhijeet Sachdev

Reputation: 43

How the Hadoop History Server is working?

There are 2 properties within configuration files I am confused with:

  1. The property yarn.nodemanager.remote-app-log-dir in yarn-site.xml:

    a.) This property controls, where the logs of map/reduce tasks will be logged?

    b.) This is the responsibility of Node Manager (NM)?

  2. The property mapreduce.jobhistory.done-dir from mapred-site.xml:

    a.) Job related files like configurations etc. are stored in this location?

    b.) This is the responsibility of Application Master (AM)?

  3. Does the History Server (HS) combines both of these information and shows a consolidated information in UI?

Upvotes: 1

Views: 776

Answers (1)

kylin
kylin

Reputation: 21

Assuming you have enabled log-aggregation,

  • 1.a. This is the log-aggregation dir, usually HDFS where NMs aggregate container-logs to.
  • 1.b. Yes.
  • 2.a. Yes.
  • 2.b. No. MR JobHistory Server will do that, by deleting JobSummary file and mv other files to ${mapreduce.jobhistory.done-dir} from ${mapreduce.jobhistory.intermediate-done-dir}.
  • 3. Yes. MR JobHistory Server Web, includes job info(from ${mapreduce.jobhistory.done-dir}) and container logs(from ${yarn.nodemanager.remote-app-log-dir}).

Upvotes: 1

Related Questions