Reputation: 33
I would like to use Spyder with pyspark (spark-2.1.1) but I cannot fix a rather frustrating Java error. I launch spyder from command line in Windows 10 after activating a conda environment (Python version is 3.5.3). This is my code:
import pyspark
sc = pyspark.SparkContext("local")
file = sc.textFile("C:/test.log")
words = file.flatMap(lambda line : line.split(" "))
words.count()
When I try to define sc
i get the following error:
File "D:\spark-2.1.1-bin-hadoop2.7\python\pyspark\java_gateway.py", line 95, in launch_gateway
raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
For the sake of completeness:
if I run pyspark
from the command line after activating the conda environment, it works and correctly performs the word count task.
If I launch Spyder App Desktop from the Start Menu in Windows 10, everything works (but I think I cannot load the right python modules from my conda environment in this case).
The related environment variables seem to be ok:
echo %SPARK_HOME%
D:\spark-2.1.1-bin-hadoop2.7
echo %JAVA_HOME%
C:\Java\jdk1.8.0_121
echo %PYTHONPATH%
D:\spark-2.1.1-bin-hadoop2.7\python;D:\spark-2.1.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip; D:\spark-2.1.1-bin-hadoop2.7\python\lib; C:\Users\user\Anaconda3
I have already tried with the solutions proposed here, but nothing worked for me. Any suggestion is greatly appreciated!
Upvotes: 1
Views: 1177
Reputation: 115
Since 1) is working, it is probably best to use the conda environment in Spyder.
In Preferences go to the "Python Interpreter"section and switch from "Default (i.e. the same as Spyder's)" to "Use the following Python interpreter".
If your environment is called spark_env
and Anaconda is installed under C:\Program Files\Continnum\Anaconda
, the python profile corresponding to this environment is C:\Program Files\Continnum\Anaconda\envs\spark_env\python.exe
.
A python console in Spyder startet after this change will be in your conda environment (note that this does not apply to IPyhton).
To check environment variables, you can use python code to make sure these are the same variables your script sees:
from os import environ
print(environ['SPARK_HOME'])
print(environ['JAVA_HOME'])
try:
print(environ['PYSPARK_SUBMIT_ARGS'])
except:
print("no problem with PYSPARK_SUBMIT_ARGS") # https://github.com/ContinuumIO/anaconda-issues/issues/1276#issuecomment-277355043
Hope that helps.
Upvotes: 0