Reputation: 83
I am using Apache Spark for dataset loading,processing,and outputting the dataset into the Oracle DB using JDBC driver.
I am using spark jdbc write method for writing the Dataset into Database. But,meanwhile writing the Dataset into the DB it takes same time for writing 10 rows and 10 Million rows into the different tables of the Database. I want to know how to performance tune this write method using spark,so that we can make wise use of the apache spark compute engine.Otherwise,there is no benefit in using it for fast computation process;if it takes time to write the dataset into the Database. The code to write the 10 rows and 10M rows is as follows:
finalpriceItemParamsGroupTable.distinct().write().mode("append").format("jdbc").option("url", connection).option("dbtable", CI_PRICEITEM_PARM).save();
finalPritmOutput.distinct().write().mode("append").format("jdbc").option("url", connection).option("dbtable", CI_TXN_DTL).save();
Attaching the screenshot of the apache spark Dashb oard Spark Stages Screenshot
If some can help out would be helpful...
Upvotes: 1
Views: 3507
Reputation: 492
You can bulk insert the records at once rather than inserting 1000 records (default setting) at a time by adding a new option batchSize and increasing its value
finalPritmOutput.distinct().write()
.mode("append")
.format("jdbc").option("url", connection)
.option("dbtable", CI_TXN_DTL)
.option("batchsize", "100000")
.save()
Refer to https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases on how to configure your jdbc for better performance.
Upvotes: 3