ben10
ben10

Reputation: 83

Time consuming write process of Spark Dataset into the Oracle DB using JDBC driver

I am using Apache Spark for dataset loading,processing,and outputting the dataset into the Oracle DB using JDBC driver.

I am using spark jdbc write method for writing the Dataset into Database. But,meanwhile writing the Dataset into the DB it takes same time for writing 10 rows and 10 Million rows into the different tables of the Database. I want to know how to performance tune this write method using spark,so that we can make wise use of the apache spark compute engine.Otherwise,there is no benefit in using it for fast computation process;if it takes time to write the dataset into the Database. The code to write the 10 rows and 10M rows is as follows:

with 10 rows to write

 finalpriceItemParamsGroupTable.distinct().write().mode("append").format("jdbc").option("url", connection).option("dbtable", CI_PRICEITEM_PARM).save();

with 10M rows to write

finalPritmOutput.distinct().write().mode("append").format("jdbc").option("url", connection).option("dbtable", CI_TXN_DTL).save();

Attaching the screenshot of the apache spark Dashb oard Spark Stages Screenshot

If some can help out would be helpful...

Upvotes: 1

Views: 3507

Answers (1)

Mansoor Baba Shaik
Mansoor Baba Shaik

Reputation: 492

You can bulk insert the records at once rather than inserting 1000 records (default setting) at a time by adding a new option batchSize and increasing its value

   finalPritmOutput.distinct().write()
     .mode("append")
     .format("jdbc").option("url", connection)
     .option("dbtable", CI_TXN_DTL)
     .option("batchsize", "100000")
     .save()

Refer to https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases on how to configure your jdbc for better performance.

Upvotes: 3

Related Questions