AFC
AFC

Reputation: 29

Read the last delta partition without read all the delta

I need to read automatically a delta file and I need to read only the last partition that was created. All the delta is big. The delta is partitioned by yyyy and mm

val df = spark.read.format("delta").load("url_delta").where(s"yyyy=${yyyy} and mm=${mm}")

I need to know the values of yyyy year and mm month. Is not efficient read all the delta and filter it bt the max("yyyy") and the max("mm")

Upvotes: 1

Views: 740

Answers (1)

Joe Widen
Joe Widen

Reputation: 2448

Actually, if you partition on yyyy and mm, then getting the max year and month will be a metadata only operation and just look at the transaction log, so it should be really quick.

Upvotes: 3

Related Questions