Reputation: 3732
I have followed the instructions on this page to install databricks-connect on Windows 10, Python 3.8, databricks version 9.1, to connect to Azure Databricks cluster:
https://towardsdatascience.com/get-started-spark-with-databricks-and-pyspark-72572179bd03
When I run:
databricks-connect test
I get this error:
* PySpark is installed at C:\Users\brend\miniconda3\envs\try-databricks-7.3\lib\site-packages\pyspark
* Checking SPARK_HOME
* Checking java version
java version "1.8.0_311"
Java(TM) SE Runtime Environment (build 1.8.0_311-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.311-b11, mixed mode)
* Skipping scala command test on Windows
* Testing python command
The system cannot find the path specified.
and it hangs indefinitely. Further investigation shows it is hanging inside the call to spark-submit.cmd (hence the call to spark-submit2.cmd).
I do not have any other Spark installation locally.
The problem has been replicated on databricks 7.3 and 9.1
What can I do to diagnose the problem further?
Upvotes: 2
Views: 1467
Reputation: 21
I hit this error because I mistakenly appended \bin
at the end of the path of JAVA_HOME environment variable after installing Java SE. The correct JAVA_HOME should be something like C:\java\Java\jre1.8.0_321
, to which the Spark scripts will append \bin
on their own.
To find whether this is true for you or not, you need to add print statements to a couple of scripts to see which program is Windows unable to find in your script.
Start with this:
As per your command output, go to folder C:\Users\brend\miniconda3\envs\try-databricks-7.3\lib\site-packages\pyspark\bin
and open the script spark-submit2.cmd
in a text editor. The first line says @echo off
. Underneath this line, add a new line saying @echo on
. Run your databricks-connect test
again and see what is the very last command printed on the screen before it fails.
In my case, I saw that the script was calling a bunch of other scripts, which eventually called C:\java\Java\jre1.8.0_321\bin\bin\java
, which led to Windows being unable to find Java.
Once you fix your error, feel free to delete all the @echo on
statements that you added.
Upvotes: 2
Reputation: 126
Please check the environment variable SPARK_HOME
and try setting that to the path obtained with databricks-connect get-jar-dir
, excluding the trailing "/jars".
Upvotes: 0