Reputation: 21
Please refer following dataframes
I want to get mismatching rows in col2 after matching col1 of both dataframes I am trying following but it's not producing result because seems the Dataframe join is forming Cartesian
val dfs = Seq((1,1),(1,2),(1,3),(2,6)).toDF("col1","col2")
val dft = Seq((1,1),(1,2),(1,4)).toDF("col1","col2")
dfs.join(dft,"col1").filter(dfs("col2").notEqual(dft("col2"))).show
In above case I expect the join & filter to return result (1,3) But seems it's joining every row of col1 in dfs to every row in col1 on dft thus producing unwanted result
Is the Cartesion as following normal behaviour for Dataframe join or I am missing some setting? how can I get (1,3) as output?
scala> dft.join(dfs,dft("col1")===dfs("col1")).show
+----+----+----+----+
|col1|col2|col1|col2|
+----+----+----+----+
| 1| 1| 1| 3|
| 1| 1| 1| 2|
| 1| 1| 1| 1|
| 1| 2| 1| 3|
| 1| 2| 1| 2|
| 1| 2| 1| 1|
| 1| 4| 1| 3|
| 1| 4| 1| 2|
| 1| 4| 1| 1|
+----+----+----+----+
Thanks chetab
Upvotes: 2
Views: 415
Reputation:
This is not Cartesian product. You join by col1
so output contains all combinations of row with matching col1
. Result is correct.
Upvotes: 1