Harry Leboeuf
Harry Leboeuf

Reputation: 745

Problems with Azure Databricks opening a file on the Blob Storage

With Azure Databricks i'm able to list the files in the blob storage, get them in a array. But when I try to open one f the file i'm getting a error. Probably due to the special syntax.

storage_account_name = "tesb"
storage_container_name = "rttracking-in"
storage_account_access_key = "xyz"
file_location = "wasbs://rttracking-in"
file_type = "xml"

spark.conf.set(
  "fs.azure.account.key."+storage_account_name+".blob.core.windows.net",
  storage_account_access_key)

xmlfiles = dbutils.fs.ls("wasbs://"+storage_container_name+"@"+storage_account_name+".blob.core.windows.net/")

import pandas as pd
import xml.etree.ElementTree as ET
import re
import os

firstfile = xmlfiles[0].path
root = ET.parse(firstfile).getroot()

The error is

IOError: [Errno 2] No such file or directory: u'wasbs://[email protected]/rtTracking_00001.xml'

Upvotes: 0

Views: 2081

Answers (2)

Harry Leboeuf
Harry Leboeuf

Reputation: 745

I did mount the Storage and then this does the trick

firstfile = xmlfiles[0].path.replace('dbfs:','/dbfs') root = ET.parse(firstfile).getroot()

Upvotes: 1

silent
silent

Reputation: 16208

My guess is that ET.parse() does not know the Spark context in which you have set up the connection to the Storage Account. Alternatively you can try to mount the storage. Then you can access files through native paths as if the files were local.

See here: https://docs.databricks.com/spark/latest/data-sources/azure/azure-storage.html#mount-an-azure-blob-storage-container

This should work then:

root = ET.parse("/mnt/<mount-name>/...")

Upvotes: 1

Related Questions