Reputation: 651

Spark read table performance optimization

I am creating a spark job, and am wondering if there are any performance benefits of reading a table via spark.sqlContext().read("table") vs spark.sql("select * from table") Or does spark's logical plan end up the same regardless?

Upvotes: 0

Answers (1)

Greg

Reputation: 589

If you use spark.read.jdbc, you can specify a partition key to read the table in parallel and then have multiple partitions for spark to work on. Whether or not this is faster depends on the rdbms and the physical design of the table. It will greatly reduce the amount of memory needed by a single executor.

https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

Upvotes: 1

Spark read table performance optimization

Answers (1)

Related Questions