Reputation: 79
Banging my head up against the wall since I just can't write a parquet file into an Azure Blob Storage. On my Azure Databricks Notebook I basically: 1. read a CSV from the same blob storage as a dataframe and 2. attempt to write the dataframe into the same storage.
I am able to read the CSV, but there is this error as I try to write the parquet file.
Here's the stack trace:
Job aborted due to stage failure: Task 0 in stage 8.0 failed 4 times, most recent failure: Lost task 0.3 in stage 8.0 (TID 20, 10.139.64.5, executor 0): shaded.databricks.org.apache.hadoop.fs.azure.AzureException: java.io.IOException at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.storeEmptyFolder(AzureNativeFileSystemStore.java:1609) ... ... Caused by: com.microsoft.azure.storage.StorageException: The specified resource does not exist.
Here's my python code:
spark.conf.set("fs.azure.sas.my_container.my_storage.blob.core.windows.net", dbutils.secrets.get(scope = "my_scope", key = "my_key"))
df100 = spark.read.format("csv").option("header", "true").load("wasbs://my_container@my_storage.blob.core.windows.net/folder/revenue.csv")
df100.write.parquet('wasbs://my_container@my_storage.blob.core.windows.net/f1/deh.parquet')
Upvotes: 0
Views: 3432
Reputation: 382
One way you can do this by mounting the storage on data bricks after that your storage will be accessible path like this /mnt/yourstoragepath/folder1
to do this , set the account name,SAS key of storage on databricks
spark.conf.set(
"fs.azure.account.key.<storage-account-name>.blob.core.windows.net",
"<storage-account-access-key>")
spark.conf.set(
"fs.azure.sas.<container-name>.<storage-account-name>.blob.core.windows.net",
"<complete-query-string-of-sas-for-the-container>")
After setting this, try to read the file like mentioned below
val df = spark.read.parquet("wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>") ```
or
dbutils.fs.ls("wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>")
to write use this syntax
df.write.mode("overwrite").option("path","/mnt/mountName/folder1/tablename").saveAsTable("database.tablename")
Please refer to this official link
Upvotes: 0