Reputation: 83
I entered these commands on pyspark
In [1]: myrdd = sc.textFile("Cloudera-cdh5.repo")
In [2]: myrdd.map(lambda x:x.upper()).collect()
When i execute 'myrdd.map(lambda x:x.upper()).collect()',I encountered a ERROR
The following is ERROR info
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, tiger): java.io.IOException: Cannot run program "/usr/local/bin/python3": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:160)
at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:135)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:73)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
at java.lang.ProcessImpl.start(ProcessImpl.java:130)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
... 13 more
The file /usr/local/bin/python3 is exist on the disk
How can i solve the above error?
Upvotes: 8
Views: 18141
Reputation: 239
I am using Windows 10 and facing the same issue. I could it fix simply by coping python.exe and renaming it to python3.exe and python.exe folder is set in the environment variable path.
Upvotes: 4
Reputation: 71
For those using Windows: Create a spark-env.cmd file in your conf directory and put the following line inside the spark-env.cmd file.
set PYSPARK_PYTHON=C:\Python39\python.exe
This stack-overflow answer explains about setting environment variables for pyspark in windows
Upvotes: 7
Reputation: 11
You can also set python as python3
sudo alternatives --set python /usr/bin/python3
python --version
Upvotes: 0
Reputation: 11597
More "stupidly", rather than a permission issue, it can just be that you do not have python3 installed or the path variable for it may be wrong.
Upvotes: 0
Reputation: 3512
you need to give access permission on /usr/local/bin/python3
this path, you can use command sudo chmod 777 /usr/local/bin/python3/*
.
I think this issue is occurred by variable PYSPARK_PYTHON, it is use to pointing python's location for every nodeyou can use below command
export PYSPARK_PYTHON=/usr/local/bin/python3
Upvotes: 4