Reputation: 21
Azure databricks - Do we have postgres connector for spark Also, how to upsert/update record in postgres using spark databricks. I am using Spark 3.1.1 When trying to write using mode=overwrite, it truncates the table but recird is not getting inserted I am new to this. Please help.
Upvotes: 1
Views: 2426
Reputation: 87319
You don't need a separate connector for PostgreSQL - it works via standard JDBC connector, and PostgreSQL JDBC driver should be included into databricks runtime - check release notes for your specific runtime. So you just need to form a correct JDBC URL as described in documentation (Spark documentation also has examples of URL for PostgreSQL).
Something like this:
df.write \
.format("jdbc") \
.option("url", "jdbc:postgresql:dbserver") \
.option("dbtable", "schema.tablename") \
.option("user", "username") \
.option("password", "password") \
.save()
Regarding the UPSERT, it's not so simple, not only for PostgreSQL, but also for other databases:
.foreachPartition/.foreach
, and forming a series of INSERT/UPDATE operations depending on if data exist in database, or not - it's doable, but you need more experience.foreach
) into INSERT ... ON CONFLICT
so you won't need to read full database (see their wiki for more information about this operation)Another approach - write your data into temporary table, and then via JDBC issue the MERGE command to incorporate your changes into the table. This is more "lightweight" method from my point of view.
Upvotes: 0