anthino12
anthino12

Reputation: 958

Create Spark context from Python in order to run databricks sql

I've been following this tutorial which lets me connect to Databricks from Python and then run delta table queries. However, I've stumbled upon a problem. When I run it for the FIRST time, I get the following error:

Container container-name in account storage-account.blob.core.windows.net not found, and we can't create it using anoynomous credentials, and no credentials found for them in the configuration.

When I go back to my Databricks cluster and run this code snippet

from pyspark import SparkContext
spark_context =SparkContext.getOrCreate()

if StorageAccountName is not None and StorageAccountAccessKey is not None:
  print('Configuring the spark context...')
  spark_context._jsc.hadoopConfiguration().set(
    f"fs.azure.account.key.{StorageAccountName}.blob.core.windows.net",
    StorageAccountAccessKey)

(where StorageAccountName and AccessKey are known) then run my Python app once again, it runs successfully without throwing the previous error. I'd like to ask, is there a way to run this code snippet from my Python app and at the same time reflect it on my Databricks cluster?

Upvotes: 1

Views: 1256

Answers (1)

Alex Ott
Alex Ott

Reputation: 87154

You just need to add these configuration options to the cluster itself as it's described in the docs. You need to set following Spark property, the same as you do in your code:

fs.azure.account.key.<storage-account-name>.blob.core.windows.net <storage-account-access-key>

For security, it's better to put access key into secret scope, and refer it from Spark configuration (see docs)

Upvotes: 1

Related Questions