Hyun
Hyun

Reputation: 606

In Spark 2.4, Doesn't Spark JDBC allow to specifying Built in function as the partitionColumn?

I am trying to change spark version 2.2.1 to 2.4.0 In spark 2.2, Following worked fine.

val query = "(select id, myPartitionColumnString from myTable) query"
val splitColumn = "CHECKSUM(myPartitionColumnString)"
spark.read.jdbc(jdbcUrl, query, splitColumn, lowerBound, upperBound, numPartitions, connectionProperties)

But In spark 2.4, It cause Error like this

User-defined partition column CHECKSUM(myPartitionColumnString) not found in the JDBC relation: struct<id: int, myPartitionColumnString: string>

I'm sure CheckSum is defined.

Upvotes: 4

Views: 1306

Answers (1)

dimon222
dimon222

Reputation: 172

They removed it during introduction of "pass direct SQL query" functionality. Breaking change was introduced in 2.4.0. It was more of a hack, there's no way to achieve this now. You can still get it in 2.3 tho

PS: if somebody finds another way to achieve same behaviour, please contact me, I'm very interested

Upvotes: 3

Related Questions