Reputation: 2080
I m trying to get creationfile metadata.
File is in: Azure Storage
Accesing data throw: Databricks
right now I m using:
file_path = my_storage_path
dbutils.fs.ls(file_path)
but it returns
[FileInfo(path='path_myFile.csv', name='fileName.csv', size=437940)]
I do not have any information about creation time, there is a way to get that information ?
other solutions in Stackoverflow are refering to files that are already in databricks Does databricks dbfs support file metadata such as file/folder create date or modified date in my case we access to the data from Databricks but the data are in Azure Storage.
Upvotes: 1
Views: 1237
Reputation: 87174
It really depends on the version of Databricks Runtime (DBR) that you're using. For example, modification timestamp is available if you use DBR 10.2 (didn't test with 10.0/10.1, but definitely not available on 9.1):
If you need to get that information you can use Hadoop FileSystem API via Py4j gateway, like this:
URI = sc._gateway.jvm.java.net.URI
Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
Configuration = sc._gateway.jvm.org.apache.hadoop.conf.Configuration
fs = FileSystem.get(URI("/tmp"), Configuration())
status = fs.listStatus(Path('/tmp/'))
for fileStatus in status:
print(f"path={fileStatus.getPath()}, size={fileStatus.getLen()}, mod_time={fileStatus.getModificationTime()}")
Upvotes: 2