AI Joes
AI Joes

Reputation: 69

Other ways to make spark read jdbc partitionly

When use spark sql to read jdbc data, spark will start only 1 partition in default. But when table is too big, spark will read very slow.
I know there are two ways to make partitions :
1. set partitionColumn,lowerBound,upperBound and numPartitions in option;
2. set an array of offsets in option;
But my situation is :
My jdbc table have no INT column or string column can easily separated by offsets for these two ways.
With these 2 ways won't work in my situation, is there any others ways to manage spark read jdbc data partitionally?

Upvotes: 1

Views: 1506

Answers (1)

Devang
Devang

Reputation: 37

Take a look at this question... the solution is to use a pseudorandom column from the database and partition on number of rows that you want to read.

Spark JDBC pseudocolumn isn't workingenter link description here

Upvotes: 0

Related Questions