Reputation: 11
val url = "jdbc:sqlserver://XXXXXX"
val properties = new Properties
// properties.setProperty(JDBCOptions.JDBC_PUSHDOWN_PREDICATE, "false") add but still not work
val df = spark.read.jdbc(url, "movies", properties)
df.filter("rated == true").show()
The code is quite simple. And it will fail with error
Job aborted due to stage failure. Caused by: SQLServerException: Invalid column name 'true'.
The table is in SQL Server and the table schema is
CREATE TABLE [dbo].[movies](
[movieId] [int] NULL,
[title] [nvarchar](max) NULL,
[releaseDate] [date] NULL,
[rated] [bit] NULL,
[screenedOn] [datetime2](7) NULL,
[ticketPrice] [decimal](10, 2) NULL
)
Using code in JDBCUtils and JDBCDialect, the 'bit' type will be translated to BooleanType, which is good. However the filter logic is pushdown to JDBCRDD, and due to code defect in MsSqlServerDialect compileValue() method, the where clause doesn't change boolean value to '1' and '0' to match the TSQL in SQL server which caused such error. And even if I write a new dialect, due to AggregatedDialect didn't overwrite compileValue() to loop all contained dialect, it still fails. That's is something I think need to be fixed from current Spark code.
And my question is, from doc there is one 'pushDownPredicate' option to control whether to let filter logic push down to where clause.
pushDownPredicate The option to enable or disable predicate push-down into the JDBC data source. The default value is true, in which case Spark will push down filters to the JDBC data source as much as possible. Otherwise, if set to false, no filter will be pushed down to the JDBC data source and thus all filters will be handled by Spark. Predicate push-down is usually turned off when the predicate filtering is performed faster by Spark than by the JDBC data source.
But even if I set properties.setProperty(JDBCOptions.JDBC_PUSHDOWN_PREDICATE, "false") it still fails with the same error. I wonder how pushDownPredicate work in spark.jdbc and do I understand correctly 'pushDownPredicate' is able to prevent filter push down to jdbcrdd.
Here is related logical plan:
Project +- Filter (isnotnull(rated#91) && (cast(rated#91 as string) = true)) +- Relation[movieId#88,title#89,releaseDate#90,rated#91,screenedOn#92,ticketPrice#93] JDBCRelation(dbo.[movies]) [numPartitions=1]
Upvotes: 0
Views: 682
Reputation: 11
We had this issue moving from databricks V9 to 10. We were using pyspark but including {"pushDownPredicate":"false"} to the config file solved it for us.
properties =
{
"user": jdbcUsername,
"password":jdbcPassword,
"driver":driver,
"pushDownPredicate":"false"
}
spark.read.jdbc(url=url, table=controltable, properties=properties) \
Upvotes: 1