Reputation: 249
I am trying to write to save a Spark DataFrame to Oracle. The save is working but the performance seems to be very poor.
I have tried 2 approaches using
dfToSave.write().mode(SaveMode.Append).jdbc(…)
-- I suppose this uses below API internally.JdbcUtils.saveTable(dfToSave,ORACLE_CONNECTION_URL, "table",props)
Both seem to be taking very long, more than 3 mins for size of 400/500 rows DataFrame.
I hit across a JIRA SPARK-10040 , but says it is resolved in 1.6.0 and I am using the same.
Anyone has faced the issue and knows how to resolve it?
Upvotes: 3
Views: 1448
Reputation: 540
I can tell you what happened to me. I turned down my partitions to query the database, and so my previously performant processing (PPP) became quite slow. However, since my dataset only collects when I post it back to the database, I (like you) thought there was a problem with the spark API, driver, connection, table structure, server configuration, anything. But, no, you just have to repartition after your query.
Upvotes: 2