Martin Studer
Martin Studer

Reputation: 2321

Spark SQL proxy tables with predicate pushdown

Is there a way in Apache Spark to create a Spark SQL proxy table that simply proxies to an underlying (custom) data source?

I have a custom data source that supports predicate pushdown by implementing org.apache.spark.sql.sources.PrunedFilteredScan and now I would like to use Spark SQL against that data source where filter predicates are passed through (pushed down) to the data source. Registering the data source as an ordinary temporary table (using sqlContext.read.format("mydatasource").load().createOrReplaceTempView("myTable")) is not an option as this will ultimately pull all data into Spark.

Upvotes: 0

Views: 241

Answers (1)

zero323
zero323

Reputation: 330063

Neither temporary views (Dataset.createTempView and Dataset.createOrReplaceTempView) nor external tables (Catalog.createExternalTable before 2.2, Catalog.createTable since 2.2) should pull all data into Spark, and all these options support prtedicate pushdowns to the same extent as the underlying source.

Upvotes: 1

Related Questions