Brendan Hill
Brendan Hill

Reputation: 3732

Databricks connect test hangs indefinitely on "The system cannot find the path specified."

I have followed the instructions on this page to install databricks-connect on Windows 10, Python 3.8, databricks version 9.1, to connect to Azure Databricks cluster:

https://towardsdatascience.com/get-started-spark-with-databricks-and-pyspark-72572179bd03

When I run:

databricks-connect test

I get this error:

* PySpark is installed at C:\Users\brend\miniconda3\envs\try-databricks-7.3\lib\site-packages\pyspark
* Checking SPARK_HOME
* Checking java version
java version "1.8.0_311"
Java(TM) SE Runtime Environment (build 1.8.0_311-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.311-b11, mixed mode)
* Skipping scala command test on Windows
* Testing python command
The system cannot find the path specified.

and it hangs indefinitely. Further investigation shows it is hanging inside the call to spark-submit.cmd (hence the call to spark-submit2.cmd).

I do not have any other Spark installation locally.

The problem has been replicated on databricks 7.3 and 9.1

What can I do to diagnose the problem further?

Upvotes: 2

Views: 1467

Answers (2)

Michael Livshutz
Michael Livshutz

Reputation: 21

I hit this error because I mistakenly appended \bin at the end of the path of JAVA_HOME environment variable after installing Java SE. The correct JAVA_HOME should be something like C:\java\Java\jre1.8.0_321, to which the Spark scripts will append \bin on their own.

To find whether this is true for you or not, you need to add print statements to a couple of scripts to see which program is Windows unable to find in your script.

Start with this:

As per your command output, go to folder C:\Users\brend\miniconda3\envs\try-databricks-7.3\lib\site-packages\pyspark\bin and open the script spark-submit2.cmd in a text editor. The first line says @echo off. Underneath this line, add a new line saying @echo on. Run your databricks-connect test again and see what is the very last command printed on the screen before it fails.

In my case, I saw that the script was calling a bunch of other scripts, which eventually called C:\java\Java\jre1.8.0_321\bin\bin\java, which led to Windows being unable to find Java.

Once you fix your error, feel free to delete all the @echo on statements that you added.

Upvotes: 2

sander-db
sander-db

Reputation: 126

Please check the environment variable SPARK_HOME and try setting that to the path obtained with databricks-connect get-jar-dir, excluding the trailing "/jars".

Upvotes: 0

Related Questions