pingboing
pingboing

Reputation: 69

How to do "case when exists..." in spark sql

What I am trying to do is

case when exists (select 1 from table B where A.id = B.id and B.value in (1,2,3)) then 'Y' else 'N' end as Col_1

It seems like "left semi join" can take care of multiple matching issue, but my understanding is that "left semi join" does not allow using columns from the right (B) table, so how can I add condition "B.value in (1,2,3)"?

Upvotes: 0

Views: 3154

Answers (1)

Chris Maurer
Chris Maurer

Reputation: 2625

The normal way to do this is to left outer join to a summary of table b:

Select a.id, Case When IsNull(b.id) Then 'N' else 'Y' end as Col_1
From A Left Outer Join 
    (Select distinct id from tableb) b On A.id=b.id

That way you are not repeatedly executing a lookup query for every id in A.

Addition Your comment indicated that you are trying to create multiple Y/N columns based on b values. Your example had a Y/N for col1 when there was a 1,2,3 and a Y/N for col2 when there was a 4,5,6. You can get there easily with one summarization of table b :

Select a.id, Case When IsNull(b.val123) Then 'N' else 'Y' end as Col_1,
             Case When IsNull(b.val456) Then 'N' Else 'Y' end as Col_2
From A Left Outer Join 
    (Select id, max(Case When value in (1,2,3) Then 'Y' End) as val123
            max(Case When value in (4,5,6) Then  'Y' End) as val456
     From tableb
     Group By id) b On A.id=b.id

This still accomplishes that lookup with only one summarization of table b.

Upvotes: 2

Related Questions