Arnab Mandal
Arnab Mandal

Reputation: 21

Azure databricks - Do we have postgres connector for spark

Azure databricks - Do we have postgres connector for spark Also, how to upsert/update record in postgres using spark databricks. I am using Spark 3.1.1 When trying to write using mode=overwrite, it truncates the table but recird is not getting inserted I am new to this. Please help.

Upvotes: 1

Views: 2426

Answers (1)

Alex Ott
Alex Ott

Reputation: 87319

You don't need a separate connector for PostgreSQL - it works via standard JDBC connector, and PostgreSQL JDBC driver should be included into databricks runtime - check release notes for your specific runtime. So you just need to form a correct JDBC URL as described in documentation (Spark documentation also has examples of URL for PostgreSQL).

Something like this:

df.write \
  .format("jdbc") \
  .option("url", "jdbc:postgresql:dbserver") \
  .option("dbtable", "schema.tablename") \
  .option("user", "username") \
  .option("password", "password") \
  .save()

Regarding the UPSERT, it's not so simple, not only for PostgreSQL, but also for other databases:

  • either you do the full join, and select entries not existing only in your dataset, and the rest is taking from the database, and then overwrite - but this is very expensive because you're reading a full database, and writing it back
  • or you're doing left join with database (you need to read it again) going down to RDD with .foreachPartition/.foreach, and forming a series of INSERT/UPDATE operations depending on if data exist in database, or not - it's doable, but you need more experience.
  • Specifically for PosgtreSQL you can convert this (foreach) into INSERT ... ON CONFLICT so you won't need to read full database (see their wiki for more information about this operation)

Another approach - write your data into temporary table, and then via JDBC issue the MERGE command to incorporate your changes into the table. This is more "lightweight" method from my point of view.

Upvotes: 0

Related Questions