sho
sho

Reputation: 224

Apache Iceberg table from Spark Explain Plan

How we can check the query is running fine or not in terms of accessing partition. Is there anything we can run explain plan for the iceberg table.

Example: I have created iceberg table using partition on month(tpep_pickup_datetime).

Query I'm running from spark is

df = spark.sql("select *  from iceberg.nyc_yellowtaxi_tripdata_v2 where tpep_pickup_datetime = '2022-01-01 00:35:40' ")

I just want to make sure that partition is working fine or not. Which partition has been accessed or is there any full table scan. I have tried running df.explain(), but it is not giving any partition information on filters added.

    Spark Running
== Physical Plan ==
*(1) Filter (isnotnull(tpep_pickup_datetime#217) AND (tpep_pickup_datetime#217 = 2022-01-01 00:35:40))
+- *(1) ColumnarToRow
   +- BatchScan[vendorid#216L, tpep_pickup_datetime#217, tpep_dropoff_datetime#218, passenger_count#219, trip_distance#220, ratecodeid#221, store_and_fwd_flag#222, pulocationid#223L, dolocationid#224L, payment_type#225L, fare_amount#226, extra#227, mta_tax#228, tip_amount#229, tolls_amount#230, improvement_surcharge#231, total_amount#232, congestion_surcharge#233, airport_fee#234] iceberg.nyc_yellowtaxi_tripdata_v2 [filters=tpep_pickup_datetime IS NOT NULL, tpep_pickup_datetime = 1640997340000000] RuntimeFilters: []

Upvotes: 4

Views: 399

Answers (1)

anirban roychowdhury
anirban roychowdhury

Reputation: 13

i have the exact same issue.

i am using athena as my query engine, the closest explanation i have been able to figure that it might be accessing the file directly via querying the manifest.json.

that’s what you see after the # in your own explain.

since iceberg has hidden partitions, the query plan never sees the partition on the physical level. iceberg just gets the file and uses the predicates you provide.

Upvotes: 1

Related Questions