Ben Dowling
Ben Dowling

Reputation: 17541

BigQuery BETWEEN JOIN

I've got 2 tables in BigQuery that I'd like to join. Table 1 has integers, and table 2 has non-overlapping integer ranges (start, end). I'd like to join table 1 and 2 to give me something like this:

-- table 1
value
1
4
9
10

-- table 2
start, end
0,5
6,9
10,15

-- joined
value,start,end
1,0,5
4,0,5
9,6,9
10,10,15

I thought this query would work:

SELECT * 
FROM 
[table1] a
INNER JOIN [table2] b 
ON a.value BETWEEN b.start AND b.end

But that gives me this error

ON clause must be AND of = comparisons of one field name from each table, with all field names prefixed with table name

I can get the correct result with this CROSS JOIN query:

SELECT * 
FROM 
[table1] a
CROSS JOIN [table2] b 
WHERE a.value BETWEEN b.start AND b.end

But the docs say this should be avoided if possible:

CROSS JOIN operations do not allow ON clauses. CROSS JOIN can return a large amount of data and might result in a slow and inefficient query or in a query that exceeds the maximum allowed per-query resources. Such queries will fail with an error. When possible, prefer queries that do not use CROSS JOIN

So, is it possible to do an INNER JOIN with a between, or improve the CROSS JOIN some other way?

Upvotes: 2

Views: 3786

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

This is a limitation for BigQuery Legacy SQL.

You should use BigQuery Standard SQL instead:

#standardSQL
SELECT * 
FROM 
`table1` a
INNER JOIN `table2` b 
ON a.value BETWEEN b.start AND b.end   

In standard SQL - you should use back-ticks instead of brackets.

Also keep in mind that end is a reserved keyword, so to make above work you need to enclose it in back-ticks also.

See below (along with dummy data from your question):

#standardSQL
WITH table1 AS (
  SELECT value 
  FROM UNNEST([1, 4, 9, 10]) AS value
),
table2 AS (
  SELECT chunk.start, chunk.`end` 
  FROM UNNEST([STRUCT<start INT64, `end` INT64>(0,5),(6,9),(10,15)]) AS chunk
)
SELECT * 
FROM `table1` a
INNER JOIN `table2` b 
ON a.value BETWEEN b.start AND b.`end`
-- ORDER BY value

Upvotes: 4

Related Questions