Reputation: 69
When use spark sql to read jdbc data, spark will start only 1 partition in default. But when table is too big, spark will read very slow.
I know there are two ways to make partitions :
1. set partitionColumn,lowerBound,upperBound and numPartitions in option;
2. set an array of offsets in option;
But my situation is :
My jdbc table have no INT column or string column can easily separated by offsets for these two ways.
With these 2 ways won't work in my situation, is there any others ways to manage spark read jdbc data partitionally?
Upvotes: 1
Views: 1506
Reputation: 37
Take a look at this question... the solution is to use a pseudorandom column from the database and partition on number of rows that you want to read.
Spark JDBC pseudocolumn isn't workingenter link description here
Upvotes: 0