Sowmya
Sowmya

Reputation: 91

Code failure on AWS EMR while running PySpark

I am trying to install and run PySpark in Jupyter notebook on AWS ElasticMapReduce (EMR). As you can see

%%info

Current session configs: {'driverMemory': '1000M', 'executorCores': 2, 'kind': 'pyspark'}
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("docker-numpy").getOrCreate()
sc = spark.sparkContext

Output

The code failed because of a fatal error:
    Unable to create Session. Error: Unexpected endpoint: http://172.31.3.115:8998.

Some things to try:
a) Make sure Spark has enough available resources for Jupyter to create a Spark context.
b) Contact your Jupyter administrator to make sure the Spark magics library is configured correctly.
c) Restart the kernel.

where 172.31.3.115 is my master internal/private IP. I have made the following changes to notebook@ip-x-x-x-x$ more .sparkmagic/config as follows

{
  "kernel_python_credentials" : {
    "username": "",
    "password": "",
    "url": "http://172.31.3.115:8998",
    "auth": "None"
  },

  "kernel_scala_credentials" : {
    "username": "",
    "password": "",
    "url": "http://172.31.3.115:8998",
    "auth": "None"
  },
  "kernel_r_credentials": {
    "username": "",
    "password": "",
    "url": "http://172.31.3.115:8998"
  },

  "logging_config": {
    "version": 1,
    "formatters": {
      "magicsFormatter": { 
        "format": "%(asctime)s\t%(levelname)s\t%(message)s",
        "datefmt": ""
      }
    },
    "handlers": {
      "magicsHandler": { 
        "class": "hdijupyterutils.filehandler.MagicsFileHandler",
        "formatter": "magicsFormatter",
        "home_path": "~/.sparkmagic"
      }
    },
    "loggers": {
      "magicsLogger": { 
        "handlers": ["magicsHandler"],
        "level": "DEBUG",
        "propagate": 0
      }
    }
  },

  "wait_for_idle_timeout_seconds": 15,
  "livy_session_startup_timeout_seconds": 60,

  "fatal_error_suggestion": "The code failed because of a fatal error:\n\t{}.\n\nSome things to try:\na) Make sure Spark has enough available resources for Jupyter to create a Spark context.\nb) Contact your Jupyter administrator to make sure the Spark magics library is configured correctly.\nc) Restart the kernel.",

  "ignore_ssl_errors": false,

  "session_configs": {
    "driverMemory": "1000M",
    "executorCores": 2
  },

  "use_auto_viz": true,
  "coerce_dataframe": true,
  "max_results_sql": 2500,
  "pyspark_dataframe_encoding": "utf-8",

  "heartbeat_refresh_seconds": 30,
  "livy_server_heartbeat_timeout_seconds": 0,
  "heartbeat_retry_seconds": 10,

  "server_extension_default_kernel_name": "pysparkkernel",
  "custom_headers": {},

  "retry_policy": "configurable",
  "retry_seconds_to_sleep_list": [0.2, 0.5, 1, 3, 5],
  "configurable_retry_policy_max_retries": 8
}

Like many others, I have tried 1, 2. First of all I am not able to locate SPARK_HOME on EMR. I have a question too, how do I install Livy on EMR or set Advanced Cluster Options? I am creating cluster manually using aws-cli as follows

aws emr create-cluster \
 --name 'EMR 6.0.0 with Docker' \
 --release-label emr-6.0.0 \
 --applications Name=Livy Name=Spark Name=Hadoop Name=JupyterHub \
 --ec2-attributes "KeyName=sowmya_private_key,SubnetId=subnet-b39550d8" \
 --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m5.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m5.xlarge \
 --use-default-roles \
 --configurations file://./emr-configuration.json

which tells me that the following cluster is up

{
    "ClusterId": "j-3T56U7A09JWAD"
}

I have been following these links/tutorials from AWS

https://aws.amazon.com/blogs/machine-learning/build-amazon-sagemaker-notebooks-backed-by-spark-in-amazon-emr/

and

https://aws.amazon.com/blogs/big-data/simplify-your-spark-dependency-management-with-docker-in-emr-6-0-0/

Not bothering much about privacy, here is a big vomit of the error log

The code failed because of a fatal error:
    Session 1 unexpectedly reached final status 'dead'. See logs:
stdout: 

stderr: 
20/06/06 04:05:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/06/06 04:05:16 INFO RMProxy: Connecting to ResourceManager at ip-172-31-3-115.us-east-2.compute.internal/172.31.3.115:8032
20/06/06 04:05:16 INFO Client: Requesting a new application from cluster with 2 NodeManagers
20/06/06 04:05:16 INFO Configuration: resource-types.xml not found
20/06/06 04:05:16 INFO ResourceUtils: Unable to find 'resource-types.xml'.
20/06/06 04:05:16 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container)
20/06/06 04:05:16 INFO Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
20/06/06 04:05:16 INFO Client: Setting up container launch context for our AM
20/06/06 04:05:16 INFO Client: Setting up the launch environment for our AM container
20/06/06 04:05:16 INFO Client: Preparing resources for our AM container
20/06/06 04:05:16 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
20/06/06 04:05:18 INFO Client: Uploading resource file:/mnt/tmp/spark-0cd5b0e0-9c69-4105-835f-ce1c484787d4/__spark_libs__3675935773843248835.zip -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/__spark_libs__3675935773843248835.zip
20/06/06 04:05:18 INFO Client: Uploading resource file:/usr/lib/livy/rsc-jars/livy-api-0.6.0-incubating.jar -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/livy-api-0.6.0-incubating.jar
20/06/06 04:05:18 INFO Client: Uploading resource file:/usr/lib/livy/rsc-jars/livy-rsc-0.6.0-incubating.jar -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/livy-rsc-0.6.0-incubating.jar
20/06/06 04:05:18 INFO Client: Uploading resource file:/usr/lib/livy/rsc-jars/netty-all-4.1.17.Final.jar -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/netty-all-4.1.17.Final.jar
20/06/06 04:05:18 INFO Client: Uploading resource file:/usr/lib/livy/repl_2.12-jars/commons-codec-1.9.jar -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/commons-codec-1.9.jar
20/06/06 04:05:19 INFO Client: Uploading resource file:/usr/lib/livy/repl_2.12-jars/livy-core_2.12-0.6.0-incubating.jar -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/livy-core_2.12-0.6.0-incubating.jar
20/06/06 04:05:19 INFO Client: Uploading resource file:/usr/lib/livy/repl_2.12-jars/livy-repl_2.12-0.6.0-incubating.jar -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/livy-repl_2.12-0.6.0-incubating.jar
20/06/06 04:05:19 INFO Client: Uploading resource file:/usr/lib/spark/R/lib/sparkr.zip#sparkr -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/sparkr.zip
20/06/06 04:05:19 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/pyspark.zip
20/06/06 04:05:19 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/py4j-0.10.7-src.zip -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/py4j-0.10.7-src.zip
20/06/06 04:05:19 WARN Client: Same name resource file:///usr/lib/spark/python/lib/pyspark.zip added multiple times to distributed cache
20/06/06 04:05:19 WARN Client: Same name resource file:///usr/lib/spark/python/lib/py4j-0.10.7-src.zip added multiple times to distributed cache
20/06/06 04:05:19 INFO Client: Uploading resource file:/mnt/tmp/spark-0cd5b0e0-9c69-4105-835f-ce1c484787d4/__spark_conf__7110997886244851568.zip -> hdfs://ip-172-31-3-115.us-east-2.compute.internal:8020/user/livy/.sparkStaging/application_1591413438501_0002/__spark_conf__.zip
20/06/06 04:05:20 INFO SecurityManager: Changing view acls to: livy
20/06/06 04:05:20 INFO SecurityManager: Changing modify acls to: livy
20/06/06 04:05:20 INFO SecurityManager: Changing view acls groups to: 
20/06/06 04:05:20 INFO SecurityManager: Changing modify acls groups to: 
20/06/06 04:05:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(livy); groups with view permissions: Set(); users  with modify permissions: Set(livy); groups with modify permissions: Set()
20/06/06 04:05:21 INFO Client: Submitting application application_1591413438501_0002 to ResourceManager
20/06/06 04:05:21 INFO YarnClientImpl: Submitted application application_1591413438501_0002
20/06/06 04:05:21 INFO Client: Application report for application_1591413438501_0002 (state: ACCEPTED)
20/06/06 04:05:21 INFO Client: 
     client token: N/A
     diagnostics: [Sat Jun 06 04:05:21 +0000 2020] Application is Activated, waiting for resources to be assigned for AM.  Details : AM Partition = <DEFAULT_PARTITION> ; Partition Resource = <memory:24576, vCores:8> ; Queue's Absolute capacity = 100.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 100.0 % ; Queue's capacity (absolute resource) = <memory:24576, vCores:8> ; Queue's used capacity (absolute resource) = <memory:0, vCores:0> ; Queue's max capacity (absolute resource) = <memory:24576, vCores:8> ; 
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1591416321309
     final status: UNDEFINED
     tracking URL: http://ip-172-31-3-115.us-east-2.compute.internal:20888/proxy/application_1591413438501_0002/
     user: livy
20/06/06 04:05:21 INFO ShutdownHookManager: Shutdown hook called
20/06/06 04:05:21 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-0cd5b0e0-9c69-4105-835f-ce1c484787d4
20/06/06 04:05:21 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-d83d52f6-d17d-4e29-a562-7013ed539e1a

YARN Diagnostics: 
Application application_1591413438501_0002 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1591413438501_0002_000001 exited with  exitCode: 7
Failing this attempt.Diagnostics: [2020-06-06 04:05:25.619]Exception from container-launch.
Container id: container_1591413438501_0002_01_000001
Exit code: 7
Exception message: Launch container failed
Shell error output: Unable to find image '839713865431.dkr.ecr.us-east-2.amazonaws.com/emr-docker-examples:pyspark-latest' locally
/usr/bin/docker: Error response from daemon: manifest for 839713865431.dkr.ecr.us-east-2.amazonaws.com/emr-docker-examples:pyspark-latest not found: manifest unknown: Requested image not found.
See '/usr/bin/docker run --help'.

Shell output: main : command provided 4
main : run as user is hadoop
main : requested yarn user is livy
Creating script paths...
Creating local dirs...
Getting exit code file...
Changing effective user to root...
Wrote the exit code 7 to /mnt/yarn/nmPrivate/application_1591413438501_0002/container_1591413438501_0002_01_000001/container_1591413438501_0002_01_000001.pid.exitcode


[2020-06-06 04:05:25.645]Container exited with a non-zero exit code 7. Last 4096 bytes of stderr.txt :


[2020-06-06 04:05:25.646]Container exited with a non-zero exit code 7. Last 4096 bytes of stderr.txt :


For more detailed output, check the application tracking page: http://ip-172-31-3-115.us-east-2.compute.internal:8088/cluster/app/application_1591413438501_0002 Then click on links to logs of each attempt.
. Failing the application..

Some things to try:
a) Make sure Spark has enough available resources for Jupyter to create a Spark context.
b) Contact your Jupyter administrator to make sure the Spark magics library is configured correctly.
c) Restart the kernel.

Upvotes: 1

Views: 4276

Answers (1)

CH Liu
CH Liu

Reputation: 1884

I usually use the following steps to create a cluster:

  1. Create an EMR cluster using AWS Management Console.

  2. Choose emr-5.25.0.

  3. The only application I choose is Spark.

  4. Add the following configuration to apply Python 3 by default:

    [
      {
        "Classification": "spark-env",
        "Configurations": [
          {
            "Classification": "export",
            "Properties": {
               "PYSPARK_PYTHON": "/usr/bin/python3"
            }
          }
        ]
      }
    ]
    
  5. Click Create cluster.

  6. Open a terminal session to SSH into the master node and install jupyterlab:

    sudo pip-3.6 install jupyterlab
    
  7. Start jupyerlab:

    export PYSPARK_DRIVER_PYTHON=$(which jupyter)
    export PYSPARK_DRIVER_PYTHON_OPTS="lab --ip=0.0.0.0"
    
    pyspark --master yarn --driver-memory 8g --executor-memory 20g --executor-cores 4
    
  8. Open second terminal session to start a SSH tunnel to the master node:

    ssh -i /path/to/ssh/key.pem -ND 8157 hadoop@master-ip-address
    

That's it.

Upvotes: 2

Related Questions