Carltonp
Carltonp

Reputation: 1344

Remove Files from Directory after uploading in Databricks using dbutils

A very clever person from StackOverflow assisted me in copying files to a directory from Databricks here: copyfiles

I am using the same principle to remove the files once it has been copied as shown in the link:

for i in range (0, len(files)):
  file = files[i].name
  if now in file:  
    dbutils.fs.rm(files[i].path,'/mnt/adls2/demo/target/' + file)
    print ('copied     ' + file)
  else:
    print ('not copied ' + file)

However, I'm getting the error:

TypeError: '/mnt/adls2/demo/target/' has the wrong type - class bool is expected.

Can someone let me know how to fix this. I thought it would be simple matter of removing the file after originally copying it using command dbutils.fs.rm

Upvotes: 10

Views: 69226

Answers (3)

Nikunj Kakadiya
Nikunj Kakadiya

Reputation: 2998

If you have huge number of files the deleting them in this way might take a lot of time. you can utilize spark parallelism to delete the files in parallel. Answer that I am providing is in scala but can be changed to python.

you can check if the directory exists or not using this function below:

import java.io._
def CheckPathExists(path:String): Boolean = 
{
  try
  {
    dbutils.fs.ls(path)
    return true
  }
  catch
  {
    case ioe:java.io.FileNotFoundException => return false
  }
}

You can define a function that is used to delete the files. you are creating this function inside an object and extends that object from Serializable class as below :

object Helper extends Serializable
{
def delete(directory: String): Unit = {
    dbutils.fs.ls(directory).map(_.path).toDF.foreach { filePath =>
      println(s"deleting file: $filePath")
      dbutils.fs.rm(filePath(0).toString, true)
    }
  }
}

Now you can first check to see if the path exists and if it returns true then you can call the delete function to delete the files within the folder on multiple tasks.

val directoryPath = "<location"
val directoryExists = CheckPathExists(directoryPath)
if(directoryExists)
{
Helper.delete(directoryPath)
}

Upvotes: 2

Shikha
Shikha

Reputation: 237

In order to remove the files from dbfs you can write this in any notebook

%fs rm -r dbfs:/user/sample_data.parquet

Upvotes: 2

Fabio Schultz
Fabio Schultz

Reputation: 527

If you want to delete all files from the following path: '/mnt/adls2/demo/target/', there is a simple command:

dbutils.fs.rm('/mnt/adls2/demo/target/', True)

Anyway, if you want to use your code, take a look at dbutils doc:

rm(dir: String, recurse: boolean = false): boolean -> Removes a file or directory

The second argument of the function is expected to be boolean, but your code has string with path:

dbutils.fs.rm(files[i].path, '/mnt/adls2/demo/target/' + file)

So your new code can be following:

for i in range (0, len(files)):
    file = files[i].name
        if now in file:  
            dbutils.fs.rm(files[i].path + file, True)
            print ('copied     ' + file)
        else:
            print ('not copied ' + file)

Upvotes: 20

Related Questions