Reputation: 41
We usually will be able to see yarn container logs in "/var/log/hadoop-yarn/containers" path. Though I am able to see logs for successful jobs, I am not able to see the logs for failed jobs. The node manager logs shows the logs getting deleted.
Log:
2017-07-13 14:16:04,170 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor (DeletionService #1): Deleting path : /var/log/hadoop-yarn/containers/application_1234567890_12345/container_11234567890_12345_11_0000
01/stdout
2017-07-13 14:16:04,180 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl (LogAggregationService #6093): renaming /var/log/hadoop-yarn/apps/hadoop/logs/application_1234567890_12345/xx.xx.xx.xx_8041.tmp to /var/log/hadoop-yarn/apps/hadoop/logs/application_1234567890_12345/xx.xx.xx.xx_8041
2017-07-13 14:16:04,181 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor (DeletionService #3): Deleting path : /var/log/hadoop-yarn/containers/application_1234567890_12345
2017-07-13 14:16:06,048 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl (Container Monitor): Stopping resource-monitoring for container_11234567890_12345_11_0000
Here's a snippet of my yarn-site.xml.
Can some one please advise on what config needs to be modified to retain logs for failed jobs?
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://ip-XX.XX.XX.XX:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/mnt/yarn</value>
<final>true</final>
</property>
<property>
<description>Where to store container logs.</description>
<name>yarn.nodemanager.log-dirs</name>
<value>/var/log/hadoop-yarn/containers</value>
</property>
<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/var/log/hadoop-yarn/apps</value>
</property>
<property>
<name>yarn.log-aggregation.enable-local-cleanup</name>
<value>true</value>
</property>
<property>
<name>yarn.scheduler.increment-allocation-mb</name>
<value>32</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
Upvotes: 3
Views: 4700
Reputation: 911
The logs get moved to HDFS when log aggregation is done, usually this is /app-logs on HDFS.
Check the below settings in the documentation
yarn.nodemanager.remote-app-log-dir Normally /app-logs on HDFS but in your case it is set to /var/log/hadoop-yarn/apps, does this directory exist on HDFS? Looks like a local directory value was put here by mistake.
Other settings that may be useful:
yarn.log-aggregation-enable: if ${yarn.log-aggregation-enable} is enabled then the NodeManager will immediately concatenate all of the containers logs into one file and upload them into HDFS in ${yarn.nodemanager.remote-app-log-dir}/${user.name}/logs/ and delete them from the local userlogs directory
yarn.nodemanager.delete.debug-delay-sec: Number of seconds after an application finishes before the nodemanager's DeletionService will delete the application's localized file directory and log directory. To diagnose Yarn application problems, set this property's value large enough (for example, to 600 = 10 minutes) to permit examination of these directories. After changing the property's value, you must restart the nodemanager in order for it to have an effect.
Upvotes: 2