Andrey
Andrey

Reputation: 6367

How to see output from executors on Amazon EMR?

I am running the following code on AWS EMR:

from pyspark.sql import SparkSession
spark = SparkSession\
    .builder\
    .appName("PythonPi")\
    .getOrCreate()
sc = spark.sparkContext

def f(_):
    print("executor running") # <= I can not find this output
    return 1

from operator import add
output = sc.parallelize(range(1, 3), 2).map(f).reduce(add)
print(output) # <= I found this output
spark.stop()

I am recording logs to s3 (Log URI is s3://brand17-logs/).

I can see output from master node here:

s3://brand17-logs/j-20H1NGEP519IG/containers/application_1618292556240_0001/container_1618292556240_0001_01_000001/stdout.gz

Where can I see output from executor node ?

I see this output when running locally.

Upvotes: 1

Views: 2663

Answers (1)

Ajay Kr Choudhary
Ajay Kr Choudhary

Reputation: 1352

You are almost there while browsing the log files.

The general convention of the stored log is something like this: Inside the containers path where there are multiple application_id, the first one(something like this application_1618292556240_0001 ending with 001) will be of the driver node and the rest will be from the executor.

I have no official documentation where it is mentioned above. But I have seen this in all my clusters.

So if you browse to the other application id, you will be able to see the executor log file.

Having said that it is very painful to browse to so many executors and search for the log.

How do I personally see the log from EMR cluster:

  1. log in to one of the EC2 instance having enough access to download the files from S3 where the log of EMR is getting saved.

  2. Navigate to the right path on the instance.

    mkdir -p /tmp/debug-log/ && cd /tmp/debug-log/

  3. Download all the files from S3 in a recursive manner.

    aws s3 cp --recursive s3://your-bucket-name/cluster-id/ .

In your case, it would be

`aws s3 cp --recursive s3://brand17-logs/j-20H1NGEP519IG/ .`
  1. Uncompress the log file:

    find . -type f -exec gunzip {} \;

Now that all the compressed files are uncompressed, we can do a recursive grep like below:

  1. grep -inR "message-that-i-am-looking-for"

the flag with grep means the following:

i -> case insensitive
n -> will display the file and line number where the message is present
R -> search it in a recursive manner.
  1. Browse to the exact file by vi pointed by the above grep command and see the more relevant log in that file.

More readings can be found here:

View Log Files

access spark log

Upvotes: 2

Related Questions