Daveed
Daveed

Reputation: 149

Does databricks dbfs support file metadata such as file/folder create date or modified date

I'm attempting to crawl through a directory in a databricks notebook to find the latest parquet file. dbfsutils.fs.ls does not appear to support any metadata about files or folders. Are there any alternative methods in python to do this? The data is stored in an azure data lake mounted to the DBFS under "/mnt/foo". Any help or pointers is appreciated.

Upvotes: 2

Views: 8796

Answers (1)

Peter Pan
Peter Pan

Reputation: 24138

On Azure Databricks as I known, the dbfs path dbfs:/mnt/foo is same as the Linux path /dbfs/mnt/foo, so you can simply use os.stat(path) in Python to get the file metadata like create date or modified date.

enter image description here

Here is my sample code.

import os
from datetime import datetime
path = '/dbfs/mnt/test'
fdpaths = [path+"/"+fd for fd in os.listdir(path)]
for fdpath in fdpaths:
    statinfo = os.stat(fdpath)
    create_date = datetime.fromtimestamp(statinfo.st_ctime)
    modified_date = datetime.fromtimestamp(statinfo.st_mtime)
    print("The statinfo of path %s is %s, \n\twhich create date and modified date are %s and %s" % (fdpath, statinfo, create_date, modified_date))

And the result is as the figure below.

enter image description here

Upvotes: 4

Related Questions