Jane Wayne
Jane Wayne

Reputation: 8865

PySpark cannot run python2.7, No such file or directory

When I am attempting to interact with Spark via pyspark, I get the following error.

java.io.IOException: Cannot run program "/Users/jwayne/anaconda/envs/ds/bin/python2.7": error=2, No such file or directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
    at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:161)
    at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:87)
    at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:63)
    at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:134)
    at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:101)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: error=2, No such file or directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.(UNIXProcess.java:247)
    at java.lang.ProcessImpl.start(ProcessImpl.java:134)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
    ... 14 more

My code looks like the following.

from pyspark.sql.types import Row
records = [Row(fname='john{}'.format(i), lname='doe{}'.format(i)) for i in range(10)]
rdd = sc.parallelize(records)
sdf = rdd.toDF()

Before I start pyspark I type in the following.

export PYSPARK_PYTHON="/Users/jwayne/anaconda/envs/ds/bin/python"

I then start pyspark like the following.

pyspark --master spark://master:7077

If I type in which python I get the following output.

/Users/jwayne/anaconda/envs/ds/bin/python

Typing /usr/bin/env python or /usr/bin/env python2.7 I get the following output.

Python 2.7.13 |Anaconda 4.3.1 (x86_64)| (default, Dec 20 2016, 23:05:08) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org

I am using conda to manage my Python environments. Before I execute anything, I have already made sure to activate the right environment: source activate ds. If I type in /Users/jwayne/anaconda/envs/ds/bin/python2.7 or /Users/jwayne/anaconda/envs/ds/bin/python I do get the Python REPL. Any ideas on what I'm doing wrong?

My Spark cluster (v1.6.1), however, is NOT using conda. which python returns /usr/bin/python and python --version returns Python 2.6.6. Am I also supposed to install conda on my Spark cluster? Looking at the stacktrace, it seems this problem occurs before it hits the Spark cluster; seems to be happening on the driver side. To me, it seems as if this file/path does exists, as far as I can tell.

Any ideas on what I'm doing wrong?

Upvotes: 1

Views: 5370

Answers (2)

onlyvinish
onlyvinish

Reputation: 474

I faced the exact issue and i fixed the same with followings,

  1. Stop all spark services. (Confirm all services been stopped using jps -m command, use kill if required.)
  2. Make sure the PATH is set to take the Anaconda python in all nodes. (Add in bachrc or bash_profile file)
  3. Start the spark service and verify.
  4. Open pyspark shell to confirm which python it is using as shown below.

enter image description here

Upvotes: 0

Jane Wayne
Jane Wayne

Reputation: 8865

The problem was on the server side. After I installed conda on the server, it worked. The exception wasn't really clear if this was a server or client side thing.

Upvotes: 0

Related Questions