Reputation: 396
I have an use case where I want to update specific row, by any identifier/where clause conditions and update that record on Oracle or SQL Server from databricks.
As i use spark.read.format("jdbc") against any of the databases, I could not easily find a way to update specific rows back to these DBs.
If i use,
df.write.format("jdbc")
.option("url", dbServerJdbcUrl)
.option("user", username)
.option("secret", password)
.option("driver", <either com.microsoft.sqlserver.jdbc.SQLServerDriver or oracle.jdbc.driver.OracleDriver>)
.option("dbTable",<table on the database platform>)
.mode('overwrite') //or other options
.save()
it only overwrites the whole "dbTable" on the database. I could not find a way to have it work by using .option("query", "update statements")
so far.
If i tend to write to another temp or parking table, then it becomes 2 stages of work, wherein, i have to go back to the Db platform and have the actual respective tables updated from the parking table.
Another note - when i do the above write, on a table which has millions of rows, and i only want to update handful of them, any of the modes are only just causing more trouble.
Is there any better solution to have the databricks update the specific rows on a database?
Upvotes: 1
Views: 1205
Reputation: 78
The code snippet you provided is written in Python and seems to be using the PySpark library to write data to a database using JDBC. However, the snippet is incomplete and missing some necessary information. Here's an example of how you can complete the code snippet:
df.write.format("jdbc") \
.option("url", dbServerJdbcUrl) \
.option("user", username) \
.option("password", password) \
.option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \ # or "oracle.jdbc.driver.OracleDriver" for Oracle
.option("dbtable", "your_table_name") \
.mode('overwrite') \
.save()
Upvotes: 0