Spark jdbc on SQL server failed with filter and disable pushDownPredicate not work

Question

val url = "jdbc:sqlserver://XXXXXX"
val properties = new Properties
//    properties.setProperty(JDBCOptions.JDBC_PUSHDOWN_PREDICATE, "false") add but still not work
val df = spark.read.jdbc(url, "movies", properties)
df.filter("rated == true").show()

The code is quite simple. And it will fail with error

Job aborted due to stage failure. Caused by: SQLServerException: Invalid column name 'true'.

The table is in SQL Server and the table schema is

CREATE TABLE [dbo].[movies](
[movieId] [int] NULL,
[title] [nvarchar](max) NULL,
[releaseDate] [date] NULL,
[rated] [bit] NULL,
[screenedOn] [datetime2](7) NULL,
[ticketPrice] [decimal](10, 2) NULL
)

Using code in JDBCUtils and JDBCDialect, the 'bit' type will be translated to BooleanType, which is good. However the filter logic is pushdown to JDBCRDD, and due to code defect in MsSqlServerDialect compileValue() method, the where clause doesn't change boolean value to '1' and '0' to match the TSQL in SQL server which caused such error. And even if I write a new dialect, due to AggregatedDialect didn't overwrite compileValue() to loop all contained dialect, it still fails. That's is something I think need to be fixed from current Spark code.

And my question is, from doc there is one 'pushDownPredicate' option to control whether to let filter logic push down to where clause.

pushDownPredicate The option to enable or disable predicate push-down into the JDBC data source. The default value is true, in which case Spark will push down filters to the JDBC data source as much as possible. Otherwise, if set to false, no filter will be pushed down to the JDBC data source and thus all filters will be handled by Spark. Predicate push-down is usually turned off when the predicate filtering is performed faster by Spark than by the JDBC data source.

But even if I set properties.setProperty(JDBCOptions.JDBC_PUSHDOWN_PREDICATE, "false") it still fails with the same error. I wonder how pushDownPredicate work in spark.jdbc and do I understand correctly 'pushDownPredicate' is able to prevent filter push down to jdbcrdd.

Here is related logical plan:

Project +- Filter (isnotnull(rated#91) && (cast(rated#91 as string) = true)) +- Relation[movieId#88,title#89,releaseDate#90,rated#91,screenedOn#92,ticketPrice#93] JDBCRelation(dbo.[movies]) [numPartitions=1]

Spark jdbc on SQL server failed with filter and disable pushDownPredicate not work

Answers (1)

Related Questions