Reputation: 73
I have 3 tables and need to fetch the records as below
Table_A,
Table_B,
Table_C
Select only Table_A records which are common in Table_B & Table_C and ignore which are not common in both Table_B & Table_C finally results would be no duplicates.
Approach 1 Tried: inner join Table_A with Table_B and again separate inner join Table_A with Table_C finally did union.
Ab = Table_A.join(Table_B,Table_A["id"] == Table_B["id"], "inner").select(common columns)
Ac = Table_A.join(Table_C,Table_A["id"] == Table_C["id"], "inner").select(common columns)
result = Ab.union(Ac) <<Got more duplicates>>
result = result,dropDuplicates(["id"])
But still I got the duplicates.
Approach 2 Tried with SparkSql:
Table_A
left outer
Table_B
on A.id = B.id
left outer Table_C
on A.id = c.id
In this Approach, no duplicates but more records than Table_A also the uncommon records.
Any suggestion and best approach would be apprciated
Upvotes: 1
Views: 55
Reputation: 222622
In Spark SQL, I would recommend exists
:
select a.*
from table_a a
where exists (select 1 from table_b b on b.id = a.id)
and exists (select 1 from table_c c on c.id = a.id)
This does the filtering you want, and will not duplicate records of table_a
in the resuletset, even if there are multiple matches in table_b
or table_c
.
Upvotes: 2