Unable to call a function in Apache Spark with Databricks

Question

I have limited knowledge on Python and Python functions. However, I believe I have a grasp of the fundamentals of Python. I was provided with the function that I have imported into a module called entity

When I try to call the function I get the error:

NameError: name 'dbutils' is not defined

The full error is as follows:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
 in 
----> 1 entity.rename_file(stagingLocation+"/tempDelta",saveloc+"/past_files","csv","mytest")

/databricks/python/lib/python3.8/site-packages/hydr8/cln/entity.py in rename_file(origin_path, dest_path, file_type, new_name)
    235 
    236 def rename_file(origin_path, dest_path, file_type, new_name):
--> 237   filelist = dbutils.fs.ls(origin_path)#list all files from origin path
    238   filtered_filelist = [x.name for x in filelist if x.name.endswith("."+file_type)]#keep names of the files that match the type requested
    239   if len(filtered_filelist) > 1:#check if we have more than 1 files of that type

NameError: name 'dbutils' is not defined

I'm attempting to call the function in Databricks notebook with the folloiwng code:

entity.rename_file(parameter1,parameter2,parameter3,parameter4")

The function is as follows:

def rename_file(origin_path, dest_path, file_type, new_name):
  filelist = dbutils.fs.ls(origin_path)#list all files from origin path
  filtered_filelist = [x.name for x in filelist if x.name.endswith("."+file_type)]#keep names of the files that match the type requested
  if len(filtered_filelist) > 1:#check if we have more than 1 files of that type
    print("Too many "+file_type+" files. You will need a different implementation")
  elif len(filtered_filelist) == 0: #check if there are no files of that type
    print("No "+file_type+" files found")
  else:
    dbutils.fs.mv(origin_path+"/"+filtered_filelist[0], dest_path+"/"+new_name+"."+file_type)#move the file to a new path (can be the same) changing the name in the process

The function was imported into the module from VSCode as a Python wheel as follows:

Do I need to define dbutils within VSCode? Because if I run the function directly from Databricks and then call the function as follows:

rename_file(parameter1,parameter2,parameter3,parameter4")

the function runs perfectly fine.

Unable to call a function in Apache Spark with Databricks

Answers (1)

Related Questions