Reputation: 123
I am trying to use DBUtils and Pyspark from a jupyter notebook python script (running on Docker) to access an Azure Data Lake Blob. However, I can't seem to get dbutils to be recognized (i.e. NameError: name 'dbutils' is not defined). I've tried explicitly importing DBUtils, as well as not importing it as I read:
"An important point to remember is to never run import dbutils in your Python script. This command succeeds but clobbers all the commands so nothing works. It is imported by default." Link
I've also tried the solution posted here, but it still threw "KeyError: 'dbutils'"
spark.conf.set('fs.azure.account.key.<storage account>.blob.core.windows.net', <storage account access key>)
spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "true")
dbutils.fs.ls("abfss://<container>@<storage account>.dfs.core.windows.net/")
spark.conf.set("fs.azure.createRemoteFileSystemDuringInitialization", "false")
Does anyone have a solution to this?
Upvotes: 0
Views: 4128
Reputation: 487
dbutil
is only supported within databricks. To access the blob storage from non-databricks spark environments like a VM on Azure or HDI-Spark you need to modify the core-site.xml
file. Here is a quick guide for a stand-alone spark environment.
Upvotes: 2