Reputation: 63
I'm working in the Microsoft Azure Databricks. And using the ls
command, I found out that there is a CSV file present in it (see first screenshot). But when I was trying to pick the CSV file into a list using glob, it's is returning an empty list (see second screenshot).
How can I list the contents of a directory in Databricks?
%fs
ls /FileStore/tables/26AS_report/normalised_consol_file_record_level/part1/customer_pan=AAACD3312M/
path = "/FileStore/tables/26AS_report/normalised_consol_file_record_level/part1/customer_pan=AAACD3312M/"
result = glob.glob(path+'/**/*.csv', recursive=True)
print(result)
Upvotes: 1
Views: 8410
Reputation: 87069
glob
is a local file-level operation that doesn't know about DBFS. If you want to use it, then you need to prepend a /dbfs
to your path:
path = "/dbfs/FileStore/tables/26AS_report/....."
Upvotes: 1
Reputation: 5073
I don't think you can use standard Python file system functions from the os.path
or glob
modules.
Instead, you should use the Databricks file system utility (dbutils.fs
). See documentation.
Given your example code, you should do something like:
dbutils.fs.ls(path)
or
dbutils.fs.ls('dbfs:' + path)
This should give a list of files that you may have to filter yourself to only get the *.csv
files.
Upvotes: 1