Rajesh
Rajesh

Reputation: 115

Delete functionality with spark sql dataframe

I have a requirement to do a load/delete specific records from postgres db for my spark application. For loading , I am using spark dataframe in the below format

sqlContext.read.format("jdbc").options(Map("url" -> "postgres url", 
      "user" -> "user" ,
      "password" -> "xxxxxx" , 
      "table" -> "(select * from employee where emp_id > 1000) as filtered_emp")).load()

To delete the data, I am writing direct sql instead of using dataframes

delete from employee where emp_id > 1000

The question is , is there a spark way of deleting records in database something similar to below? Or the only way is to use direct sql?

sqlContext.read.format("jdbc").options(Map("url" -> "postgres url", 
      "user" -> "user" ,
      "password" -> "xxxxxx" , 
      "table" -> "(delete from employee where emp_id > 1000) as filtered_emp")).load()

Upvotes: 4

Views: 12486

Answers (2)

Batuhan Tüter
Batuhan Tüter

Reputation: 311

I have experienced something similar and used Java JDBC libraries in Scala do delete rows from the table. You can see the solution here: Delete rows from Azure Sql table using Azure Databricks with Scala

Upvotes: 0

Amit Kulkarni
Amit Kulkarni

Reputation: 704

If you want to modify(delete records) the actual source of data i.e. tables in postgres then Spark wouldn't be a great way. You can use jdbc client directly for achieving the same.

If you want to do this anyway (in distrubuted manner based on some clues that you are computing as part of dataframes); you can have the same jdbc client code written in correspondence with dataframe that have logic/trigger info for deleting records and that can we executed on multiple workers parallely.

Upvotes: 3

Related Questions