Reputation: 149
I'm attempting to crawl through a directory in a databricks notebook to find the latest parquet file. dbfsutils.fs.ls does not appear to support any metadata about files or folders. Are there any alternative methods in python to do this? The data is stored in an azure data lake mounted to the DBFS under "/mnt/foo". Any help or pointers is appreciated.
Upvotes: 2
Views: 8796
Reputation: 24138
On Azure Databricks as I known, the dbfs path dbfs:/mnt/foo
is same as the Linux path /dbfs/mnt/foo
, so you can simply use os.stat(path)
in Python to get the file metadata like create date or modified date.
Here is my sample code.
import os
from datetime import datetime
path = '/dbfs/mnt/test'
fdpaths = [path+"/"+fd for fd in os.listdir(path)]
for fdpath in fdpaths:
statinfo = os.stat(fdpath)
create_date = datetime.fromtimestamp(statinfo.st_ctime)
modified_date = datetime.fromtimestamp(statinfo.st_mtime)
print("The statinfo of path %s is %s, \n\twhich create date and modified date are %s and %s" % (fdpath, statinfo, create_date, modified_date))
And the result is as the figure below.
Upvotes: 4