Reputation: 2757
I have limited knowledge on Python and Python functions. However, I believe I have a grasp of the fundamentals of Python. I was provided with the function that I have imported into a module called entity
When I try to call the function I get the error:
NameError: name 'dbutils' is not defined
The full error is as follows:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<command-2967368096587460> in <module>
----> 1 entity.rename_file(stagingLocation+"/tempDelta",saveloc+"/past_files","csv","mytest")
/databricks/python/lib/python3.8/site-packages/hydr8/cln/entity.py in rename_file(origin_path, dest_path, file_type, new_name)
235
236 def rename_file(origin_path, dest_path, file_type, new_name):
--> 237 filelist = dbutils.fs.ls(origin_path)#list all files from origin path
238 filtered_filelist = [x.name for x in filelist if x.name.endswith("."+file_type)]#keep names of the files that match the type requested
239 if len(filtered_filelist) > 1:#check if we have more than 1 files of that type
NameError: name 'dbutils' is not defined
I'm attempting to call the function in Databricks notebook with the folloiwng code:
entity.rename_file(parameter1,parameter2,parameter3,parameter4")
The function is as follows:
def rename_file(origin_path, dest_path, file_type, new_name):
filelist = dbutils.fs.ls(origin_path)#list all files from origin path
filtered_filelist = [x.name for x in filelist if x.name.endswith("."+file_type)]#keep names of the files that match the type requested
if len(filtered_filelist) > 1:#check if we have more than 1 files of that type
print("Too many "+file_type+" files. You will need a different implementation")
elif len(filtered_filelist) == 0: #check if there are no files of that type
print("No "+file_type+" files found")
else:
dbutils.fs.mv(origin_path+"/"+filtered_filelist[0], dest_path+"/"+new_name+"."+file_type)#move the file to a new path (can be the same) changing the name in the process
The function was imported into the module from VSCode as a Python wheel as follows:
Do I need to define dbutils within VSCode? Because if I run the function directly from Databricks and then call the function as follows:
rename_file(parameter1,parameter2,parameter3,parameter4")
the function runs perfectly fine.
Upvotes: 2
Views: 826
Reputation: 1722
To develop code in Visual Studio you need to use databricks-connect library. It will execute your code on Spark cluster.
However it have serious of limitations:
More information here: https://docs.databricks.com/dev-tools/databricks-connect.html#requirements
Community is aware of this limitations that's why in early 2022 databricks-tunnel will be available, which will run your code on databricks cloud not directly on cluster. There will be ready extensions for PyCharm and VS Code. Below is picture from roadmap meeting last week:
Upvotes: 1