user2333312
user2333312

Reputation: 131

Pyspark error accessing temp storage dir in Azure Blob

I'm running in an Azure Synapse notebook and trying to use PySpark to read a SQL table. It seems to be able to read the table, but when I want to show the results, I get an error indicating that it can't access the temporary directory.

If I specify the temp directory using the "wasbs" schema, I get this error:

External file access failed due to internal error: 'Parameters provided to connect to the Azure storage account are not valid.

If I specify the temp directory with the abfss schema, I get this error:

CREATE EXTERNAL TABLE AS SELECT statement failed as the path name 'abfss://@.dfs.core.windows.net/temp/SQLAnalyticsConnectorStaging/...tbl' could not be used for export. Please ensure that the specified path is a directory which exists or can be created, and that files can be created in that directory.

The container name, account name, and account key are correct, so I'm guessing that I'm not setting the config correctly, but I've tried everything I could think of.

I've also set "hadoop" config by replacing the "fs.azure.account.key" with "spark.hadoop.fs.azure.account.key".

Code examples are below. I think it's successfully accessing the database because I'm able to show the columns using print ("columns", df.columns). I get the error when I try to show the data with print ("head", df.head())

Any help is appreciated.

from pyspark.sql import SparkSession

 

container = "container_name"
storage_account_name = "storage_account_name"
account_key = "account_key"
appName = "test"
master = "local"

spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .getOrCreate()

spark.conf.set(f"fs.azure.account.key.{storage_account_name}.blob.core.windows.net", account_key)

df = spark.read \
  .option(Constants.TEMP_FOLDER, f"wasbs://{container}@{storage_account_name}.blob.core.windows.net/temp") \
  .synapsesql("db_name.schema_name..spark_test")

print ("columns", df.columns)
print ("head", df.head())

Upvotes: 0

Views: 633

Answers (1)

Pratik Lad
Pratik Lad

Reputation: 8402

could not be used for export. Please ensure that the specified path is a directory which exists or can be created, and that files can be created in that directory.

PolyBase is unable to perform the operation, which results in this error.

Causes:

  • You encounter a network error when attempting to access the Azure blob storage on the necessary network ports.
  • configuration the Azure storage account.

Resolution:

  • Enable outbound traffic from CTL01 node over the supplied internet connectivity to *.blob.core.windows.net on local firewall ports 80 and 443.
  • Verify that the storage account is set up as a standard storage account using either standard locally redundant storage (Standard-LRS) or standard geo-redundant storage (Standard-GRS) and configure the account for General Purpose.

Upvotes: 0

Related Questions