Miłosz Tadrzak
Miłosz Tadrzak

Reputation: 41

Databricks Connect: can't connect to remote cluster on azure, command: 'databricks-connect test' stops

I try to set up Databricks Connect to be able work with remote Databricks Cluster already running on Workspace on Azure. When I try to run command: 'databricks-connect test' it never ends.

I follow official documentation.

I've installed most recent Anaconda in version 3.7. I've created local environment: conda create --name dbconnect python=3.5

I've installed 'databricks-connect' in version 5.1 what matches configuration of my cluster on Azure Databricks.

    pip install -U databricks-connect==5.1.*

I've already set 'databricks-connect configure as follows:

    (base) C:\>databricks-connect configure
    The current configuration is:
    * Databricks Host: ******.azuredatabricks.net
    * Databricks Token: ************************************
    * Cluster ID: ****-******-*******
    * Org ID: ****************
    * Port: 8787

After above steps I try to run 'test' command for databricks connect:

    databricks-connect test

and as a result procedure starts and stops after warning about MetricsSystem as it is visible below:

    (dbconnect) C:\>databricks-connect test
    * PySpark is installed at c:\users\miltad\appdata\local\continuum\anaconda3\envs\dbconnect\lib\site-packages\pyspark
    * Checking java version
    java version "1.8.0_181"
    Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
    * Testing scala command
    19/05/31 08:14:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    19/05/31 08:14:34 WARN MetricsSystem: Using default name SparkStatusTracker for source because neither spark.metrics.namespace nor spark.app.id is set. 

I expect that process should move to next steps like it is in official documentation:

    * Testing scala command
    18/12/10 16:38:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    18/12/10 16:38:50 WARN MetricsSystem: Using default name SparkStatusTracker for source because neither spark.metrics.namespace nor spark.app.id is set.
    18/12/10 16:39:53 WARN SparkServiceRPCClient: Now tracking server state for 5abb7c7e-df8e-4290-947c-c9a38601024e, invalidating prev state
    18/12/10 16:39:59 WARN SparkServiceRPCClient: Syncing 129 files (176036 bytes) took 3003 ms
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 2.4.0-SNAPSHOT
          /_/

    Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_152)
    Type in expressions to have them evaluated.
    Type :help for more information.

So my process stops after 'WARN MetricsSystem: Using default name SparkStatusTracker'.

What am I doing wrong? Should I configure something more?

Upvotes: 4

Views: 3184

Answers (3)

Dhananjaya Jayashanka
Dhananjaya Jayashanka

Reputation: 21

Port 8787 was used for Azure in the past, but 15001 is now used for both Azure and AWS. Very old clusters may be using 8787, but all new clusters use 15001. Change the port using,

databricks-connect configure

add same configurations and change the port as 15001 After above step done I tried to run 'test' command,

databricks-connect test

then it worked

Upvotes: 0

Molly
Molly

Reputation: 49

Looks like this feature isn't officially supported on runtimes 5.3 or below. If there are limitations on updating the runtime, i would make sure the spark conf is set as follows: spark.databricks.service.server.enabled true However, with the older runtimes things still might be wonky. I would recommend doing this with runtime 5.5 or 6.1 or above.

Upvotes: 1

simon_dmorias
simon_dmorias

Reputation: 2473

Lots of people seem to be seeing this issue with the test command on Windows. But if you try to use Databricks connect it works fine. It seems safe to ignore.

Upvotes: 0

Related Questions