Reputation: 462
I am trying to follow this Python notebook. I installed Spark directly in the notebook (!pip install pyspark
), but when I do:
spark = SparkSession \
.builder \
.appName("question recommendation") \
.config("spark.driver.maxResultSize", "96g") \
.config("spark.driver.memory", "96g") \
.config("spark.executor.memory", "8g") \
.config("spark.master", "local[12]") \
.getOrCreate()
sc = spark.sparkContext
I get a Runtime error
on the first line:
RuntimeError Traceback (most recent call last)
<ipython-input-17-1b87e1472109> in <module>
1 # spark config
----> 2 spark = SparkSession \
3 .builder \
4 .appName("question recommendation") \
5 .config("spark.driver.maxResultSize", "96g") \
~\anaconda3\lib\site-packages\pyspark\sql\session.py in getOrCreate(self)
226 sparkConf.set(key, value)
227 # This SparkContext may be an existing one.
--> 228 sc = SparkContext.getOrCreate(sparkConf)
229 # Do not update `SparkConf` for existing `SparkContext`, as it's shared
230 # by all sessions.
~\anaconda3\lib\site-packages\pyspark\context.py in getOrCreate(cls, conf)
390 with SparkContext._lock:
391 if SparkContext._active_spark_context is None:
--> 392 SparkContext(conf=conf or SparkConf())
393 return SparkContext._active_spark_context
394
~\anaconda3\lib\site-packages\pyspark\context.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
142 " is not allowed as it is a security risk.")
143
--> 144 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
145 try:
146 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
~\anaconda3\lib\site-packages\pyspark\context.py in _ensure_initialized(cls, instance, gateway, conf)
337 with SparkContext._lock:
338 if not SparkContext._gateway:
--> 339 SparkContext._gateway = gateway or launch_gateway(conf)
340 SparkContext._jvm = SparkContext._gateway.jvm
341
~\anaconda3\lib\site-packages\pyspark\java_gateway.py in launch_gateway(conf, popen_kwargs)
106
107 if not os.path.isfile(conn_info_file):
--> 108 raise RuntimeError("Java gateway process exited before sending its port number")
109
110 with open(conn_info_file, "rb") as info:
RuntimeError: Java gateway process exited before sending its port number
I am very new to Apache Spark, is there anything I have installed incorrectly? Should I have installed it via Conda? Is there anything on my system that I need to check out?
Upvotes: 0
Views: 3329
Reputation: 619
The main clue to the error is in the last line
"RuntimeError: Java gateway process exited before sending its port number"
You can check an old stack overflow link below for solution
Pyspark: Exception: Java gateway process exited before sending the driver its port number
Upvotes: 2