Spark - How to get the latest hour in S3 path?

Question

I'm using a Databricks notebook with Spark and Scala to read data from S3 into a DataFrame:

myDf = spark.read.parquet(s"s3a://data/metrics/*/*/*/). where * wildcards represent year/month/day.

Or I just hardcode it: myDf = spark.read.parquet(s"s3a://data/metrics/2018/05/20/)

Now I want to add an hour parameter right after the day. The idea is to obtain data from S3 for the most recently available hour.

If I do myDf = spark.read.parquet(s"s3a://data/metrics/2018/05/20/*) then I'll get data for all hours of may 20th.

How is it possible to achieve this in a Databricks notebook without hardcoding the hour?

Answers (1)