Reputation: 745
With Azure Databricks i'm able to list the files in the blob storage, get them in a array. But when I try to open one f the file i'm getting a error. Probably due to the special syntax.
storage_account_name = "tesb"
storage_container_name = "rttracking-in"
storage_account_access_key = "xyz"
file_location = "wasbs://rttracking-in"
file_type = "xml"
spark.conf.set(
"fs.azure.account.key."+storage_account_name+".blob.core.windows.net",
storage_account_access_key)
xmlfiles = dbutils.fs.ls("wasbs://"+storage_container_name+"@"+storage_account_name+".blob.core.windows.net/")
import pandas as pd
import xml.etree.ElementTree as ET
import re
import os
firstfile = xmlfiles[0].path
root = ET.parse(firstfile).getroot()
The error is
IOError: [Errno 2] No such file or directory: u'wasbs://[email protected]/rtTracking_00001.xml'
Upvotes: 0
Views: 2081
Reputation: 745
I did mount the Storage and then this does the trick
firstfile = xmlfiles[0].path.replace('dbfs:','/dbfs') root = ET.parse(firstfile).getroot()
Upvotes: 1
Reputation: 16208
My guess is that ET.parse()
does not know the Spark context in which you have set up the connection to the Storage Account. Alternatively you can try to mount the storage. Then you can access files through native paths as if the files were local.
This should work then:
root = ET.parse("/mnt/<mount-name>/...")
Upvotes: 1