kanishk kashyap
kanishk kashyap

Reputation: 63

Listing files on Microsoft Azure Databricks

I'm working in the Microsoft Azure Databricks. And using the ls command, I found out that there is a CSV file present in it (see first screenshot). But when I was trying to pick the CSV file into a list using glob, it's is returning an empty list (see second screenshot).

How can I list the contents of a directory in Databricks?

enter image description here

enter image description here

%fs 
    ls /FileStore/tables/26AS_report/normalised_consol_file_record_level/part1/customer_pan=AAACD3312M/
path = "/FileStore/tables/26AS_report/normalised_consol_file_record_level/part1/customer_pan=AAACD3312M/"
result = glob.glob(path+'/**/*.csv', recursive=True)
print(result)

Upvotes: 1

Views: 8410

Answers (2)

Alex Ott
Alex Ott

Reputation: 87069

glob is a local file-level operation that doesn't know about DBFS. If you want to use it, then you need to prepend a /dbfs to your path:

path = "/dbfs/FileStore/tables/26AS_report/....."

Upvotes: 1

wovano
wovano

Reputation: 5073

I don't think you can use standard Python file system functions from the os.path or glob modules.

Instead, you should use the Databricks file system utility (dbutils.fs). See documentation.

Given your example code, you should do something like:

dbutils.fs.ls(path)

or

dbutils.fs.ls('dbfs:' + path)

This should give a list of files that you may have to filter yourself to only get the *.csv files.

Upvotes: 1

Related Questions