How to pushdown partitioning when reading from jdbc Spark

Question

i want to read from a table via JDBC connection, using both filtering with a where clause, and partitioning on another column with the options partitionColumn, lowerBound, upperBound, numPartitions.

Currently both filtering and partitioning is working, however i notice that the partitioning is not being pushdown. so i get a SELECT * FROM mytable as a CTE, and then that CTE gets the partioning, resulting in 200 queries like (200 given by numPartitions):

SELECT * FROM (SELECT * FROM TABLE WHERE =filter)TBL WHERE PARTITION_COLUMN>value in LOWERBOUND AND PARTITION_COLUMN


Instead i would like to have 200 queries (200 given by numPartitions)  with:
SELECT * FROM (SELECT * FROM TABLE WHERE =filter AND PARTITION_COLUMN>value in LOWERBOUND AND PARTITION_COLUMN

Is there a way to pushdown the partitioning to the CTE? the idea is to avoid two times the SELECT *, as i'm using a big table and having that query 200 times is slow.
This is my code:
#Gather the value for lowerBound
query_min = ''' (SELECT 
    MIN(PARTITION_COLUMN) AS MIN_VALUE
    FROM SCHEMA.TABLE )TBL '''

min_df = sparkSession.read \
    .format("jdbc") \
    .option("dbtable", query_min) \
    .load()
minval=min_df.head(1)[0][0]


#Gather the value for upperBound
query_max = ''' (SELECT 
    MAX(PARTITION_COLUMN) AS MAX_VALUE
    FROM SCHEMA.TABLE )TBL '''

max_df = sparkSession.read \
    .format("jdbc") \
    .option("dbtable", query_max) \
    .load()
maxval=max_df.head(1)[0][0]

#Read the table filtering with FILTER_COLUMN, and partitioning with PARTITION_COLUMN
query = ''' (SELECT * FROM TABLE WHERE FILTER_COLUMN=VALUE
FROM SCHEMA.TABLE )TBL '''

df = spark.read.format("jdbc")\
                    .option("dbtable", query)\
                    .option("partitionColumn", "PARTITION_COLUMN")\
                    .option("numPartitions", "200")\
                    .option("lowerBound", minval)\
                    .option("upperBound", maxval)\
                    .load()

How to pushdown partitioning when reading from jdbc Spark

Answers (1)

Related Questions