Reputation: 143
Now I need to diff two table using spark sql,i find a sql server's answer like this :
(SELECT *
FROM table1
EXCEPT
SELECT *
FROM table2)
UNION ALL
(SELECT *
FROM table2
EXCEPT
SELECT *
FROM table1)
Hope somebody can tell me how to using spark sql like this in sql server? (Do not care the special col ,just use *)
Upvotes: 2
Views: 9479
Reputation: 3544
You can do this something like this :
scala> val df1=sc.parallelize(Seq((1,2),(3,4))).toDF("a","b")
df1: org.apache.spark.sql.DataFrame = [a: int, b: int]
scala> val df2=sc.parallelize(Seq((1,2),(5,6))).toDF("a","b")
df2: org.apache.spark.sql.DataFrame = [a: int, b: int]
scala> df1.create
createOrReplaceTempView createTempView
scala> df1.createTempView("table1")
scala> df2.createTempView("table2")
scala> spark.sql("select * from table1 EXCEPT select * from table2").show
+---+---+
| a| b|
+---+---+
| 3| 4|
+---+---+
scala> spark.sql("(select * from table2 EXCEPT select * from table1) UNION ALL (select * from table1 EXCEPT select * from table2)").show
+---+---+
| a| b|
+---+---+
| 5| 6|
| 3| 4|
+---+---+
Note : In your case you have to make dataframe out of the JDBC calls and then register the table and perform the operations.
Upvotes: 7