Dev
Dev

Reputation: 13773

Partitioning in spark while reading from RDBMS via JDBC

I am running spark in cluster mode and reading data from RDBMS via JDBC.

As per Spark docs, these partitioning parameters describe how to partition the table when reading in parallel from multiple workers:

These are optional parameters.

What would happen if I don't specify these:

Upvotes: 15

Views: 23593

Answers (1)

zero323
zero323

Reputation: 330413

If you don't specify either {partitionColumn, lowerBound, upperBound, numPartitions} or {predicates} Spark will use a single executor and create a single non-empty partition. All data will be processed using a single transaction and reads will be neither distributed nor parallelized.

See also:

Upvotes: 30

Related Questions