Reputation: 2321
Is there a way in Apache Spark to create a Spark SQL proxy table that simply proxies to an underlying (custom) data source?
I have a custom data source that supports predicate pushdown by implementing org.apache.spark.sql.sources.PrunedFilteredScan
and now I would like to use Spark SQL against that data source where filter predicates are passed through (pushed down) to the data source. Registering the data source as an ordinary temporary table (using sqlContext.read.format("mydatasource").load().createOrReplaceTempView("myTable")
) is not an option as this will ultimately pull all data into Spark.
Upvotes: 0
Views: 241
Reputation: 330063
Neither temporary views (Dataset.createTempView
and Dataset.createOrReplaceTempView
) nor external tables (Catalog.createExternalTable
before 2.2, Catalog.createTable
since 2.2) should pull all data into Spark, and all these options support prtedicate pushdowns to the same extent as the underlying source.
Upvotes: 1