Reputation: 8865
When I am attempting to interact with Spark via pyspark
, I get the following error.
java.io.IOException: Cannot run program "/Users/jwayne/anaconda/envs/ds/bin/python2.7": error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:161) at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:87) at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:63) at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:134) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:247) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 14 more
My code looks like the following.
from pyspark.sql.types import Row
records = [Row(fname='john{}'.format(i), lname='doe{}'.format(i)) for i in range(10)]
rdd = sc.parallelize(records)
sdf = rdd.toDF()
Before I start pyspark
I type in the following.
export PYSPARK_PYTHON="/Users/jwayne/anaconda/envs/ds/bin/python"
I then start pyspark
like the following.
pyspark --master spark://master:7077
If I type in which python
I get the following output.
/Users/jwayne/anaconda/envs/ds/bin/python
Typing /usr/bin/env python
or /usr/bin/env python2.7
I get the following output.
Python 2.7.13 |Anaconda 4.3.1 (x86_64)| (default, Dec 20 2016, 23:05:08) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin Type "help", "copyright", "credits" or "license" for more information. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://anaconda.org
I am using conda
to manage my Python environments. Before I execute anything, I have already made sure to activate the right environment: source activate ds
. If I type in /Users/jwayne/anaconda/envs/ds/bin/python2.7
or /Users/jwayne/anaconda/envs/ds/bin/python
I do get the Python REPL. Any ideas on what I'm doing wrong?
My Spark cluster (v1.6.1), however, is NOT using conda. which python
returns /usr/bin/python
and python --version
returns Python 2.6.6
. Am I also supposed to install conda on my Spark cluster? Looking at the stacktrace, it seems this problem occurs before it hits the Spark cluster; seems to be happening on the driver side. To me, it seems as if this file/path does exists, as far as I can tell.
Any ideas on what I'm doing wrong?
Upvotes: 1
Views: 5370
Reputation: 474
I faced the exact issue and i fixed the same with followings,
jps -m
command, use kill
if required.)Upvotes: 0
Reputation: 8865
The problem was on the server side. After I installed conda on the server, it worked. The exception wasn't really clear if this was a server or client side thing.
Upvotes: 0