Reputation: 528
I have a where clause in Spark SQL that for some reason doesn't return any records. I think it doesn't work and so I would like to ask what would be equivalent to it?
SELECT
c.client_id,
current_date() as insert_date
FROM
CLIENT_SUB c
WHERE
(c.client_id, insert_date) not in (SELECT client_id, insert_date from CLIENT_SUBSCRIBER_CONTRACT)
I heard I could do it with a join
Upvotes: 0
Views: 807
Reputation: 222462
I would recommend not exists
: it is null
-safe, while not it
isn't - and it usually scales better, too.
I am also suspiscious about the reference to insert_date
: do you really mean that, or you actually want current_date()
?
select cs.client_id, current_date() as insert_date
from client_sub cs
where not exists (
select 1
from client_subscriber_contract csc
where
csc.client_id = c.client_id
and csc.insert_date = cs.insert_date
-- or, maybe: csc.insert_date = current_date()
)
For performance, consider an index on client_subscriber_contract(client_id, insert_date)
.
Upvotes: 2
Reputation: 5803
I suspect you have null values in the output of your subquery because not in
doesn't output anything when matched against values containing nulls. Try
not in (select client_id, insert_date
from CLIENT_SUBSCRIBER_CONTRACT
where coalesce(client_id, insert_date) is not null)
I recommend looking into not exists
for your purpose though
Upvotes: 0