NFC
NFC

Reputation: 199

"Python was not found but can be installed" when using spark-submit on Windows

I have installed PySpark on Windows following the steps described here, with the Spark version 3.1.2 and the package type pre-built for Apache Hadoop 2.7, while the python version is 3.9.6.

I wanted to try the spark-submit with the wordcount example, so I went to the Command Prompt in the SPARK_HOME directory and I inputted this:

 bin\spark-submit examples\src\main\python\wordcount.py README.md

However, I got this message:

Python was not found but can be installed from the Microsoft Store: ms-windows-store://pdp/?productid=9NJ46SX7X90P 

I don't know what is wrong, I made sure Python was added to PATH when I installed it and the command bin\pyspark seems to work correctly as well. I have also tried going to Settings > Apps > App execution aliases and disable all the python options, but it doesn't work.

Edit: This is the error message I get if I try the App execution aliases method:

Exception in thread "main" java.io.IOException: Cannot run program "python3": CreateProcess error=2, The system cannot find the file specified
        at java.lang.ProcessBuilder.start(Unknown Source)
        at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:97)
        at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
        at java.lang.ProcessImpl.create(Native Method)
        at java.lang.ProcessImpl.<init>(Unknown Source)
        at java.lang.ProcessImpl.start(Unknown Source)
        ... 15 more

Upvotes: 9

Views: 5689

Answers (1)

Fareed Khan
Fareed Khan

Reputation: 2923

It's late, but here is the solution.


  1. Go to environment variables window by typing in search environment variables

enter image description here

  1. create a new environment variable with variable name equal to PYSPARK_PYTHON and a value of python

enter image description here

  1. Checking whether solution works or not?

As you can see after launching PySpark, a.take() only works when PySpark able to detect Python

enter image description here

or you can also confirm by running wordcount.py using command shown in your mentioned document.

C:\spark-3.3.0-bin-hadoop3>bin\spark-submit examples\src\main\python\wordcount.py README.md

this is the output of the above command i.e., counting words in a file

22/06/24 11:53:33 INFO SparkContext: Running Spark version 3.3.0

...

guide](https://spark.apache.org/contributing.html): 1
information: 1
get: 1
started: 1
contributing: 1
project.: 1

...

Upvotes: 11

Related Questions