Reputation: 29
I need to read automatically a delta file and I need to read only the last partition that was created. All the delta is big. The delta is partitioned by yyyy and mm
val df = spark.read.format("delta").load("url_delta").where(s"yyyy=${yyyy} and mm=${mm}")
I need to know the values of yyyy year and mm month. Is not efficient read all the delta and filter it bt the max("yyyy")
and the max("mm")
Upvotes: 1
Views: 740
Reputation: 2448
Actually, if you partition on yyyy and mm, then getting the max year and month will be a metadata only operation and just look at the transaction log, so it should be really quick.
Upvotes: 3