Reputation: 651
I am creating a spark job, and am wondering if there are any performance benefits of reading a table via spark.sqlContext().read("table")
vs spark.sql("select * from table")
Or does spark's logical plan end up the same regardless?
Upvotes: 0
Views: 891
Reputation: 589
If you use spark.read.jdbc, you can specify a partition key to read the table in parallel and then have multiple partitions for spark to work on. Whether or not this is faster depends on the rdbms and the physical design of the table. It will greatly reduce the amount of memory needed by a single executor.
https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
Upvotes: 1