ilya
ilya

Reputation: 129

Where to find node logs in AWS EMR cluster?

I have pyspark program running on AWS EMR cluster. Cluster config is like this - emr-5.31.0, hadoop 2.10.0, hive 2.3.7, hue 4.7.1, pig 0.17.0.

Program processes some files on hdfs file system but at some moment it is getting errors.

In amazon console - YARN applications - application_XXX (Spark) - executors - driver - stderr: 'could not obtain block ... file=

A little before this message there is 'Task 0 in stage 35 failed 4 times. aborting job'

If i go to amazon console - YARN applications - application_XXX (Spark) - stages - 35 - tasks - 0 - stdout - i dont see anything bad at first glance except a lot of 'GC (allocation Failure)' messages.

In its stderr - there is a WARN - 'Could not obtain block XXX, file= No live nodes contain current block Block locations: Dead nodes: . Throwing a BlockMissingException.

If i go to monitoring tab - node status - i see that one node became unhealthy at that time and thats it. Number of nodes also changed at 'live data nodes', 'MR total nodes', 'MR active nodes', MR lost nodes' charts.

As i understand, task cannot find file on hdfs because node it was hosted on became unhealthy.

My question is where i can find the reasons node became unhealthy. I wasnt able to find any other logs on amazon console. May be there are some node-local places where this reason is stored?

Upvotes: 2

Views: 2165

Answers (3)

ilya
ilya

Reputation: 129

Actually, for amazon there are more logs accessible via s3 location - there are logs for node boot and configuration part, and logs from running services on node - hdfs and yarn, which i was looking for. Path looks like this - s3 location/cluster id/node/node id/applications - here i was able to find hdfs and yarn logs.

Upvotes: 1

Joey Lesh
Joey Lesh

Reputation: 426

On the Summary page for your EMR cluster there is a section named "Configuration details".

Below that, there is a label named "Log URI". It points to an S3 URI, but, there is also a small folder icon.

Click on that icon and you can browse to the logs on the nodes for your EMR cluster.

Upvotes: 1

Hi I launched a EMR myself some time ago, dont remember about the logs. But consulting the docs here:

https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage-view-web-log-files.html

It states that they are stored on the machines (which I assume you have the keys), they are also stored on S3 by default. Not sure in which bucket they will be created.

Best Regards :)

Upvotes: 1

Related Questions