Krishna Reddy
Krishna Reddy

Reputation: 1099

How to move files of same extension in databricks files system?

I am facing file not found exception when i am trying to move the file with * in DBFS. Here both source and destination directories are in DBFS. I have the source file named "test_sample.csv" available in dbfs directory and i am using the command like below from notebook cell,

dbutils.fs.mv("dbfs:/usr/krishna/sample/test*.csv", "dbfs:/user/abc/Test/Test.csv")

Error:

java.io.FileNotFoundException: dbfs:/usr/krishna/sample/test*.csv

I appreciate any help. Thanks.

Upvotes: 19

Views: 50198

Answers (4)

ruloweb
ruloweb

Reputation: 744

If you run your code in a Databricks cluster, you could access DBFS using the nodes file system. I'm not sure if in the background it requests all the objects and then filters, but at least you can use wildcards. E.g. from a databricks notebook

%sh
ls /dbfs/cluster-logs/*/driver/log4j-2021-09-01*

Upvotes: 1

Parvathirajan Natarajan
Parvathirajan Natarajan

Reputation: 1305

Since the wildcards are not allowed, we need to make it work in this way (list the files and then move or copy - slight traditional way)

import os

def db_list_files(file_path, file_prefix):
  file_list = [file.path for file in dbutils.fs.ls(file_path) if os.path.basename(file.path).startswith(file_prefix)]
  return file_list

files = db_list_files('dbfs:/your/src_dir', 'foobar')

for file in files:
  dbutils.fs.cp(file, os.path.join('dbfs:/your/tgt_dir', os.path.basename(file)))

Upvotes: 5

chetan_surwade
chetan_surwade

Reputation: 82

dbutils.fs.mv("file:/<source>", "dbfs:/<destination>", recurse=True)

Use the above command to move a local folder to dbfs.

Upvotes: -1

Hauke Mallow
Hauke Mallow

Reputation: 3182

Wildcards are currently not supported with dbutils. You can move the whole directory:

dbutils.fs.mv("dbfs:/tmp/test", "dbfs:/tmp/test2", recurse=True)

or just a single file:

dbutils.fs.mv("dbfs:/tmp/test/test.csv", "dbfs:/tmp/test2/test2.csv")

As mentioned in the comments below, you can use python to implement this wildcard-logic. See also some code examples in my following answer.

Upvotes: 22

Related Questions