tako_tokyo
tako_tokyo

Reputation: 79

How is it that I can read from an Azure Blob Storage and fail to write onto it?

Banging my head up against the wall since I just can't write a parquet file into an Azure Blob Storage. On my Azure Databricks Notebook I basically: 1. read a CSV from the same blob storage as a dataframe and 2. attempt to write the dataframe into the same storage.

I am able to read the CSV, but there is this error as I try to write the parquet file.

Here's the stack trace:

Job aborted due to stage failure: Task 0 in stage 8.0 failed 4 times, most recent failure: Lost task 0.3 in stage 8.0 (TID 20, 10.139.64.5, executor 0): shaded.databricks.org.apache.hadoop.fs.azure.AzureException: java.io.IOException at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.storeEmptyFolder(AzureNativeFileSystemStore.java:1609) ... ... Caused by: com.microsoft.azure.storage.StorageException: The specified resource does not exist.

Here's my python code:

spark.conf.set("fs.azure.sas.my_container.my_storage.blob.core.windows.net", dbutils.secrets.get(scope = "my_scope", key = "my_key"))

read csv

df100 = spark.read.format("csv").option("header", "true").load("wasbs://my_container@my_storage.blob.core.windows.net/folder/revenue.csv") 

write parquet

df100.write.parquet('wasbs://my_container@my_storage.blob.core.windows.net/f1/deh.parquet')  

end

Upvotes: 0

Views: 3432

Answers (1)

Doof
Doof

Reputation: 382

One way you can do this by mounting the storage on data bricks after that your storage will be accessible path like this /mnt/yourstoragepath/folder1

to do this , set the account name,SAS key of storage on databricks

spark.conf.set(
  "fs.azure.account.key.<storage-account-name>.blob.core.windows.net",
  "<storage-account-access-key>")

spark.conf.set(
  "fs.azure.sas.<container-name>.<storage-account-name>.blob.core.windows.net",
  "<complete-query-string-of-sas-for-the-container>")

After setting this, try to read the file like mentioned below


val df = spark.read.parquet("wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>")  ``` 
or

dbutils.fs.ls("wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>")

to write use this syntax

df.write.mode("overwrite").option("path","/mnt/mountName/folder1/tablename").saveAsTable("database.tablename")

Please refer to this official link

Upvotes: 0

Related Questions