Reputation: 831
I'm facing an issue when trying to use pyspark=3.1.2
. I have java 1.8
installed and added in my user path. But according to the docs it does not need any other dependency.
My question is, do I have to install anything else? Like Spark
itself or semething?
I'm using conda environments
in VS Code
.
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
k:\Deep Learning\Github\stock-pred\test_spark.ipynb Cell 2' in <cell line: 1>()
----> 1 spark = SparkSession \
2 .builder \
3 .appName("test-wretrwrwe") \
4 .getOrCreate()
File ~\anaconda3\envs\prepro\lib\site-packages\pyspark\sql\session.py:228, in SparkSession.Builder.getOrCreate(self)
226 sparkConf.set(key, value)
227 # This SparkContext may be an existing one.
--> 228 sc = SparkContext.getOrCreate(sparkConf)
229 # Do not update `SparkConf` for existing `SparkContext`, as it's shared
230 # by all sessions.
231 session = SparkSession(sc)
File ~\anaconda3\envs\prepro\lib\site-packages\pyspark\context.py:384, in SparkContext.getOrCreate(cls, conf)
382 with SparkContext._lock:
383 if SparkContext._active_spark_context is None:
--> 384 SparkContext(conf=conf or SparkConf())
385 return SparkContext._active_spark_context
File ~\anaconda3\envs\prepro\lib\site-packages\pyspark\context.py:144, in SparkContext.__init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
139 if gateway is not None and gateway.gateway_parameters.auth_token is None:
140 raise ValueError(
141 "You are trying to pass an insecure Py4j gateway to Spark. This"
142 " is not allowed as it is a security risk.")
--> 144 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
145 try:
146 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
147 conf, jsc, profiler_cls)
File ~\anaconda3\envs\prepro\lib\site-packages\pyspark\context.py:331, in SparkContext._ensure_initialized(cls, instance, gateway, conf)
329 with SparkContext._lock:
330 if not SparkContext._gateway:
--> 331 SparkContext._gateway = gateway or launch_gateway(conf)
332 SparkContext._jvm = SparkContext._gateway.jvm
334 if instance:
File ~\anaconda3\envs\prepro\lib\site-packages\pyspark\java_gateway.py:108, in launch_gateway(conf, popen_kwargs)
105 time.sleep(0.1)
107 if not os.path.isfile(conn_info_file):
--> 108 raise Exception("Java gateway process exited before sending its port number")
110 with open(conn_info_file, "rb") as info:
111 gateway_port = read_int(info)
Exception: Java gateway process exited before sending its port number
Upvotes: 0
Views: 664
Reputation: 132
Using Windows as an example.
Method 1 (temporary solution):
import os
os.environ['JAVA_HOME'] = "C:\Program Files\Java\jdk1.8.0_331"
Method 2:
Set the system variable in the environment variables, add a new variable named "JAVA_HOME" with the value "C:\Program Files\Java\jdk1.8.0_331"
Upvotes: 0