Reputation: 1194

Error running PySpark, cannot connect to master

Hi I have the Following python code:

from __future__ import print_function

import sys

from pyspark.sql import SparkSession

from data import Data

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: runner <number_of_executors>", file=sys.stderr)
        exit(-1)

    spark = SparkSession \
        .builder \
        .master("spark://138.xxx.xxx.xxx:7077") \
        .config("spark.num-executors", sys.argv[1]) \
        .config("spark.driver.memory", "1g") \
        .config("spark.executor.memory", "1g") \
        .config("spark.executor.cores", "4") \
        .appName("APP") \
        .getOrCreate()

    data = Data(spark)

    spark.stop()

Where Data class will load various csv files, but that's not important.

I have the following lines added to ~/.bash_profile:

export SPARK_HOME=/home/tsar/spark
export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$SPARK_HOME/conf:$PATH
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build

I also have the following conf files:

slaves with a list of nodes

spark-defaults.conf

spark.master                       spark://138.xxx.xxx.xxx:7077
spark.driver.memory                1g
spark.executor.memory              1g
spark.executor.cores               4

spark-env.sh

export SPARK_MASTER_HOST=138.xxx.xxx.xxx
export SPARK_MASTER_MEMORY=5g
export SPARK_WORKER_MEMORY=1024m

What happens next is:

pyspark --master 138.xxx.xxx.xxx:7077
- launches pyspark and connects it to master
spark-submit --num-executors 17 main.py 4
- ignores configuration inside python, which is unexpected, and takes arguments from conf file, unless overwritten by command line options, connects to master and executes code
python main.py 3
- the option I want to use, fails to connect to master withe the stacktrace below

    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    17/03/26 19:58:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    17/03/26 19:59:12 ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
    17/03/26 19:59:12 WARN StandaloneSchedulerBackend: Application ID is not initialized yet.
    17/03/26 19:59:12 WARN StandaloneAppClient$ClientEndpoint: Drop UnregisterApplication(null) because has not yet connected to master
    17/03/26 19:59:12 ERROR TransportResponseHandler: Still have 3 requests outstanding when connection from /xxx.xxx.xxx.xxx:7077 is closed
    17/03/26 19:59:12 ERROR SparkContext: Error initializing SparkContext.
    java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
        at scala.Predef$.require(Predef.scala:224)
        at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:91)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:524)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:236)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:745)
    Traceback (most recent call last):
      File "main.py", line 21, in <module>
        .appName("CS5052-01 Processing") \
      File "/cs/home/bbt/spark/python/pyspark/sql/session.py", line 169, in getOrCreate
        sc = SparkContext.getOrCreate(sparkConf)
      File "/cs/home/bbt/spark/python/pyspark/context.py", line 307, in getOrCreate
        SparkContext(conf=conf or SparkConf())
      File "/cs/home/bbt/spark/python/pyspark/context.py", line 118, in __init__
        conf, jsc, profiler_cls)
      File "/cs/home/bbt/spark/python/pyspark/context.py", line 179, in _do_init
        self._jsc = jsc or self._initialize_context(self._conf._jconf)
      File "/cs/home/bbt/spark/python/pyspark/context.py", line 246, in _initialize_context
        return self._jvm.JavaSparkContext(jconf)
      File "/cs/home/bbt/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1401, in __call__
      File "/cs/home/bbt/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
    : java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
        at scala.Predef$.require(Predef.scala:224)
        at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:91)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:524)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:236)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:745)

How does last way to run differ from the others? spark master 100% exists at the IP address and is accessible.

Upvotes: 4

Answers (2)

tschomacker

Reputation: 814

I faced a similar situation. In my case it was Spark version mismatch. My SparkStreaming ran on 2.2.1 and my spark master on 3.0.1. Bringing both to the same version fixed the problem for me. See this question: Running a python Apache Beam Pipeline on Spark

Upvotes: 1

John Leonard

Reputation: 929

I had a similar problem in that master and slaves could definitely communicate but when I went to run a job I got the same obscure error:

java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem

The normal explanation for this is that master & slave cannot communicate but in my case the user I was running spark as on the slave node did not have permission to write to the log file. I found this by going to the spark master page (default master:8080) and drilling down to the slave's workers stderr output from the failed application link.

Are there any more details about the error in the stderr or in the spark logs (default /opt/spark/logs/spark-xxx)?

Upvotes: 2

Error running PySpark, cannot connect to master

Answers (2)

Related Questions