Reputation: 467
I have two DataFrames. One is a MasterList, the other is an InsertList
MasterList:
+--------+--------+
| ttm_id|audit_id|
+--------+--------+
| 1| 10|
| 15| 10|
+--------+--------+
InsertList:
+--------+--------+
| ttm_id|audit_id|
+--------+--------+
| 1| 10|
| 15| 9|
+--------+--------+
In Scala, how do I join two DataFrames but only append to the new DataFrame records
WHERE MasterList.ttm_id = InsertList.ttm_id AND
MasterList.audit_id != InsertList.audit_id
-
ExpectedOutput:
+--------+--------+
| ttm_id|audit_id|
+--------+--------+
| 1| 10|
| 15| 10|
| 15| 9|
+--------+--------+
Upvotes: 0
Views: 1887
Reputation: 41957
It seems that you want to merge rows from insertList
dataFrame that are not in masterList
dataFrame. This can be achived using except
function
insertList.except(masterList)
And you just use union
function merge both dataFrames as
masterList.union(insertList.except(masterList))
You should get what you desire as
+------+--------+
|ttm_id|audit_id|
+------+--------+
|1 |10 |
|15 |10 |
|15 |9 |
+------+--------+
Upvotes: 2
Reputation: 35219
I'd anti join (NOT IN
) by both columns and union
val masterList = Seq((1, 10), (15, 10)).toDF("ttm_id", "audit_id")
val insertList = Seq((1, 10), (15, 9)).toDF("ttm_id", "audit_id")
insertList
.join(masterList, Seq("ttm_id", "audit_id"), "leftanti")
.union(masterList)
.show
// +------+--------+
// |ttm_id|audit_id|
// +------+--------+
// | 15| 9|
// | 1| 10|
// | 15| 10|
// +------+--------+
Upvotes: 2