List json files, partitioned by year/month/day, from Azure account storage using PySpark

Question

I have azure account storage with json files, partitioned by year/month/day/hour. I need to list all of jsons between two dates, eg. 20200505 to 20201220, so I have list of url/dir. I do not need to load any content, just to list all files, which lives between these two dates.

I need to use for it azure databricks with pyspark. Is it possible to just use sth like:

.load(from "/y=2020/month=05/day=05/**/*.json" to "/y=2020/month=12/day=20/**/*.json")

Here is structure of azure account storage:

List json files, partitioned by year/month/day, from Azure account storage using PySpark

Answers (1)

Related Questions