Nitin Kumar
Nitin Kumar

Reputation: 249

Performance Issue with writing Spark Dataframes to Oracle Database

I am trying to write to save a Spark DataFrame to Oracle. The save is working but the performance seems to be very poor.

I have tried 2 approaches using

  1. dfToSave.write().mode(SaveMode.Append).jdbc(…) -- I suppose this uses below API internally.
  2. JdbcUtils.saveTable(dfToSave,ORACLE_CONNECTION_URL, "table",props)

Both seem to be taking very long, more than 3 mins for size of 400/500 rows DataFrame.

I hit across a JIRA SPARK-10040 , but says it is resolved in 1.6.0 and I am using the same.

Anyone has faced the issue and knows how to resolve it?

Upvotes: 3

Views: 1448

Answers (1)

Ion Freeman
Ion Freeman

Reputation: 540

I can tell you what happened to me. I turned down my partitions to query the database, and so my previously performant processing (PPP) became quite slow. However, since my dataset only collects when I post it back to the database, I (like you) thought there was a problem with the spark API, driver, connection, table structure, server configuration, anything. But, no, you just have to repartition after your query.

Upvotes: 2

Related Questions