Performance comparison cx_oracle vs write.format('jdbc')

Can anyone tell me which way to insert in oracle is more performatico?

Write.format('jdbc') mode or using CX_Oracle?

In my project I came across a case where they use write.format('jdbc') to INSERT and CX_Oracle to UPDATE, so I'm thinking of changing to INSERT and UPDATE on the same CX_Oracle connection, what do you think ?

Upvotes: 1

Views: 634

Answers (1)

Rahul Kumar
Rahul Kumar

Reputation: 2345

I has worked on similar usecase. Here are some takeaway from my last project.

  1. cx_oracle is very slow compared to write.format('jdbc'). I was inserting 1M records and there was drastic difference b/w those two approach. cx_oracle even with executeMany didn't help much. I will strongly recommend to use spark JDBC.

  2. Even in case of update, I ended up doing delete (SQL Query) - insert (using pyspark), because couldn't achieve update in spark and the alternative was very slow.

  3. Spark does parallel writes while inserting to db too.

  4. Even for read operation use spark jdbc read because spark will optimize the job and send projection and filtering at DB directly.

Upvotes: 1

Related Questions