How to troubleshoot 'pyspark' is not recognized... error on Windows?

Question

It has been two weeks during which I have been trying to install Spark (pyspark) on my Windows 10 machine, now I realized that I need your help.

When I try to start 'pyspark' in the command prompt, I still receive the following error:

The Problem

'pyspark' is not recognized as an internal or external command, operable program or batch file.

To me this hints at a problem with the path/environmental variables, but I cannot find the root of the problem.

My Actions

I have tried multiple tutorials but the best I found was the one by Michael Galarnyk. I followed his tutorial step by step:

Installed Java
Installed Anaconda
Downloaded Spark 2.3.1 (I changed the commands accordingly as Michael's tutorial uses a different version) from the official website. I moved it in line with the tutorial in the cmd prompt:
```
mv C:\Users\patri\Downloads\spark-2.3.1-bin-hadoop2.7.tgz C:\opt\spark\spark-2.3.1-bin-hadoop2.7.tgz
```
Then I untarred it:
```
gzip -d spark-2.3.1-bin-hadoop2.7.tgz
```
and
```
tar xvf spark-2.3.1-bin-hadoop2.7.tar
```

Downloaded Hadoop 2.7.1 from Github:

curl -k -L -o winutils.exe https://github.com/steveloughran/winutils/raw/master/hadoop-2.7.1/bin/winutils.exe?raw=true

Set my Environmental Variables accordingly:
```
setx SPARK_HOME C:\opt\spark\spark-2.3.1-bin-hadoop2.7
setx HADOOP_HOME C:\opt\spark\spark-2.3.1-bin-hadoop2.7
setx PYSPARK_DRIVER_PYTHON jupyter
setx PYSPARK_DRIVER_PYTHON_OPTS notebook
```
Then added C:\opt\spark\spark-2.3.1-bin-hadoop2.7\bin to my path variables. My environmental user variables now look like this: Current Environmental Variables

These actions should have done the trick, but when I run pyspark --master local[2], I still get the error from above. Can you help to track down this error using the information from above?

Checks

I ran a couple of checks in the command prompt to verify the following:

Java is installed
Anaconda is installed
pip is installed
Python is installed

mchl_k · Accepted Answer

I resolved this issue by setting the variables as "system variables" rather than "user variables". Note

In my case setting variables from command line resulted in "user variables" so I had to use the Advanced settings GUI to enter values as "system variables"
You may want to rule out any installation issue, in which case try to cd into C:\opt\spark\spark-2.3.1-bin-hadoop2.7\bin and run pyspark master local[2] (make sure winutils.exe is there); if that does not work then you have other issues than just env variables

How to troubleshoot 'pyspark' is not recognized... error on Windows?

The Problem

My Actions

Checks

Answers (2)

Related Questions

How to troubleshoot &#39;pyspark&#39; is not recognized... error on Windows?

The Problem

My Actions

Checks

Answers (2)

Related Questions

How to troubleshoot 'pyspark' is not recognized... error on Windows?