Cory Grinstead
Cory Grinstead

Reputation: 651

Spark read table performance optimization

I am creating a spark job, and am wondering if there are any performance benefits of reading a table via spark.sqlContext().read("table") vs spark.sql("select * from table") Or does spark's logical plan end up the same regardless?

Upvotes: 0

Views: 891

Answers (1)

Greg
Greg

Reputation: 589

If you use spark.read.jdbc, you can specify a partition key to read the table in parallel and then have multiple partitions for spark to work on. Whether or not this is faster depends on the rdbms and the physical design of the table. It will greatly reduce the amount of memory needed by a single executor.

https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

Upvotes: 1

Related Questions