Kaharon
Kaharon

Reputation: 395

Unable to load the data from a MS SQL database from Microsoft Azure Databricks notebook

I want to retrieve the data from a MS SQL database (not hosted on Azure) to Microsoft Azure Databricks notebook. Here are the steps of what I have done:

  1. go on the portal of azure and create a resource groups
  2. create Azure Databricks Service (but I do NOT use the 'Deploy Azure Databricks workspace in your own Virtual Network (VNet)' option → maybe I should...)
  3. once Azure Databricks Service is ready, I launch it and create a cluster without specific configuration
  4. then I create a notebook (running on the previous cluster) with this script
msSqlServer = "jdbc:sqlserver://xxx.xxx.xxx.xxx:1433;ApplicationIntent=readonly;databaseName=" + msSqlDatabase
query = """(select * from mytable)foo"""

df = (
  spark.read.format("jdbc")
  .option("url", msSqlServer)
  .option("dbtable", query)
  .option("user", msSqlUser)
  .option("password", msSqlPassword)
  .load()
)

and I get this error:

com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host xxx.xxx.xxx.xxx, port 1433 has failed. Error: "connect timed out. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".

Before asking on StackoverFlow, I have contacted my company network and DBA teams. The DBA said the connection is okay but then disconnects immediatly"

For your information, I have followed this tutorial https://learn.microsoft.com/en-us/azure/databricks/data/data-sources/sql-databases

Maybe there is something to configure, but I am not in networking at all (I am just a little data scientist who wants to play with notebook on azure databricks and access his company databases). For example how can I Make sure that TCP connections to the port are not blocked by a firewall ?

If you have some idea or you have already met this issue, feel free to share. :)

If you need more information, please tell me.

Upvotes: 0

Views: 1272

Answers (1)

CHEEKATLAPRADEEP
CHEEKATLAPRADEEP

Reputation: 12788

If you have already configured your Azure SQL database to listen on TCP/IP traffic on port 1433 then it could be any of following three reasons:

  • JDBC connection string in correct.
  • Firewall is blocking the incoming connection.
  • Azure SQL database is not running.

From Azure Portal get the Azure SQL Database JDBC connection string.

enter image description here

SQL Databases using JDBC using Python:

jdbcHostname = "chepra.database.windows.net"
jdbcDatabase = "chepra"
jdbcPort = "1433"
username = "chepra"
password = "XXXXXXXXXX"
jdbcUrl = "jdbc:sqlserver://{0}:{1};database={2}".format(jdbcHostname, jdbcPort, jdbcDatabase)
connectionProperties = {
  "user" : username,
  "password" : password,
  "driver" : "com.microsoft.sqlserver.jdbc.SQLServerDriver"
}
pushdown_query = "(Select * from customers where CustomerID = 2) CustomerID"
df = spark.read.jdbc(url=jdbcUrl, table=pushdown_query, properties=connectionProperties)
display(df)

enter image description here

SQL Databases using JDBC using Scala:

val jdbcHostname = "chepra.database.windows.net"
val jdbcPort = 1433
val jdbcDatabase = "chepra"

// Create the JDBC URL without passing in the user and password parameters.
val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}"

// Create a Properties() object to hold the parameters.
import java.util.Properties
val connectionProperties = new Properties()

connectionProperties.put("user", s"chepra")
connectionProperties.put("password", s"XXXXXXXXXX")

val employees_table = spark.read.jdbc(jdbcUrl, "customers", connectionProperties)
employees_table.show()

enter image description here

Upvotes: 1

Related Questions