Why spark is slower when compared to sqoop , when it comes to jdbc?

Question

It is understood , while migrating/load from oracle db to hdfs/parquet , it is preferred to use SQOOP rather than SPARK with JDBC driver.

Spark suppose to be 100x faster when processing right ? Then what is wrong with Spark ? Why people prefer SQOOP while loading data from oracle db tables ?

Please suggest me what should i need to do make Spark faster when loading data from oracle.

Alex Ott · Accepted Answer

Spark is fast when it knows how to parallelize queries. If you're just executing single query, then Spark doesn't know what to do. You can improve speed by using parameters lowerBound, upperBound, numPartitions when reading data with spark.read.jdbc, but it depends really on the design of your tables.

You can find more documentation here.

Why spark is slower when compared to sqoop , when it comes to jdbc?

Answers (2)

Related Questions