akhetos
akhetos

Reputation: 706

Read data in blob storage in Databricks

Trying to read my data in a blob storage from DataBricks

spark.conf.set(
  "fs.azure.account.key.ACCOUNTNAME.blob.core.windows.net",
  "MYKEY")

This should allow to connect to my storage blob

Then, according to documentation it's should be easy to access file in my blob.

I tried many thing, nothing work

One example

blob_url = "https://ACCOUNTNAME.blob.core.windows.net/BLOBNAME/PATH/file"
df=pd.read_csv(blob_url)

return

HTTP Error 404: The specified resource does not exist.

Any idea? I can show all my attempts with error message if needed

Another error

%scala

dbutils.fs.ls("wasbs://[email protected]/PATH")

shaded.databricks.org.apache.hadoop.fs.azure.AzureException: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Container BLOBNAME in account ACCOUNTNAME.blob.core.windows.net not found, and we can't create it using anoynomous credentials, and no credentials found for them in the configuration.

Upvotes: 0

Views: 12761

Answers (1)

CHEEKATLAPRADEEP
CHEEKATLAPRADEEP

Reputation: 12768

You may checkout the below code to read data from blob storage using Azure Databricks.

# Set up an account access key:
# Get Storage account Name and 

spark.conf.set("fs.azure.account.key.chepra.blob.core.windows.net", "gv7nVISerl8wbK9mPGm8TC3CQIEjV3Z5dQxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxldlOiA==")
df = spark.read.csv("wasbs://[email protected]/Azure/AzureCostAnalysis.csv", header="true")
df.show()

enter image description here

For dbutils.fs.ls no need to use magic cells like %scala, you may use the below code to results all the files in the container:

# Get file information 
dbutils.fs.ls("wasbs://[email protected]/Azure")

enter image description here

Hope this helps. Do let us know if you any further queries.

Upvotes: 2

Related Questions