Sparksql to select certain records against 3 tables

Question

I have 3 tables and need to fetch the records as below

Table_A,
Table_B,
Table_C

Select only Table_A records which are common in Table_B & Table_C and ignore which are not common in both Table_B & Table_C finally results would be no duplicates.

Approach 1 Tried: inner join Table_A with Table_B and again separate inner join Table_A with Table_C finally did union.

Ab = Table_A.join(Table_B,Table_A["id"] == Table_B["id"], "inner").select(common columns)

Ac = Table_A.join(Table_C,Table_A["id"] == Table_C["id"], "inner").select(common columns)

result = Ab.union(Ac) <>
result = result,dropDuplicates(["id"])

But still I got the duplicates.

Approach 2 Tried with SparkSql:

Table_A 
left outer 
Table_B
on A.id = B.id 
left outer Table_C 
on A.id = c.id

In this Approach, no duplicates but more records than Table_A also the uncommon records.

Any suggestion and best approach would be apprciated

Sparksql to select certain records against 3 tables

Answers (1)

Related Questions