Reputation: 6010
We would expect that this Google BigQuery query would remove at most 10 rows of results - but this query gives us zero results - despite that table A has thousands of rows all with unique ENCNTR_IDs.
SELECT ENCNTR_ID
FROM `project.dataset.table_A`
WHERE ENCNTR_ID NOT IN
(
SELECT ENCNTR_ID
FROM `project.dataset.table_B`
LIMIT 10
)
If we make the query self-referential, it behaves as expected: we get thousands of results with just 10 rows removed.
SELECT ENCNTR_ID
FROM `project.dataset.table_A`
WHERE ENCNTR_ID NOT IN
(
SELECT ENCNTR_ID
FROM `project.dataset.table_A` # <--- same table name
LIMIT 10
)
What are we doing wrong? Why does the first query give us zero results rather than just remove 10 rows of results?
Upvotes: 4
Views: 1965
Reputation: 59165
Solution: Use NOT EXISTS
instead of NOT IN
when dealing with possible nulls:
SELECT *
FROM UNNEST([1,2,3]) i
WHERE NOT EXISTS (SELECT * FROM UNNEST([2,3,null]) i2 WHERE i=i2)
# 1
Previous guess - which turned out to be the cause:
SELECT *
FROM UNNEST([1,2,3]) i
WHERE i NOT IN UNNEST([2,3])
# 1
vs
SELECT *
FROM UNNEST([1,2,3]) i
WHERE i NOT IN UNNEST([2,3,null])
# This query returned no results.
Are there any nulls in that project.dataset.table_B
?
Upvotes: 4