Praxiteles
Praxiteles

Reputation: 6010

Why is this BigQuery WHERE NOT IN statement giving no results?

We would expect that this Google BigQuery query would remove at most 10 rows of results - but this query gives us zero results - despite that table A has thousands of rows all with unique ENCNTR_IDs.

SELECT ENCNTR_ID
FROM `project.dataset.table_A`
WHERE ENCNTR_ID NOT IN
(
SELECT ENCNTR_ID
FROM `project.dataset.table_B`
LIMIT 10
)

If we make the query self-referential, it behaves as expected: we get thousands of results with just 10 rows removed.

SELECT ENCNTR_ID
FROM `project.dataset.table_A`
WHERE ENCNTR_ID NOT IN
(
SELECT ENCNTR_ID
FROM `project.dataset.table_A` # <--- same table name
LIMIT 10
)

What are we doing wrong? Why does the first query give us zero results rather than just remove 10 rows of results?

Upvotes: 4

Views: 1965

Answers (1)

Felipe Hoffa
Felipe Hoffa

Reputation: 59165

Solution: Use NOT EXISTS instead of NOT IN when dealing with possible nulls:

SELECT *
FROM UNNEST([1,2,3]) i
WHERE NOT EXISTS (SELECT * FROM UNNEST([2,3,null]) i2 WHERE i=i2)

# 1

Previous guess - which turned out to be the cause:

SELECT *
FROM UNNEST([1,2,3]) i
WHERE i NOT IN UNNEST([2,3])

# 1

vs

SELECT *
FROM UNNEST([1,2,3]) i
WHERE i NOT IN UNNEST([2,3,null])

# This query returned no results.

Are there any nulls in that project.dataset.table_B?

Upvotes: 4

Related Questions