Reputation: 17541
I've got 2 tables in BigQuery that I'd like to join. Table 1 has integers, and table 2 has non-overlapping integer ranges (start, end). I'd like to join table 1 and 2 to give me something like this:
-- table 1
value
1
4
9
10
-- table 2
start, end
0,5
6,9
10,15
-- joined
value,start,end
1,0,5
4,0,5
9,6,9
10,10,15
I thought this query would work:
SELECT *
FROM
[table1] a
INNER JOIN [table2] b
ON a.value BETWEEN b.start AND b.end
But that gives me this error
ON clause must be AND of = comparisons of one field name from each table, with all field names prefixed with table name
I can get the correct result with this CROSS JOIN query:
SELECT *
FROM
[table1] a
CROSS JOIN [table2] b
WHERE a.value BETWEEN b.start AND b.end
But the docs say this should be avoided if possible:
CROSS JOIN operations do not allow ON clauses. CROSS JOIN can return a large amount of data and might result in a slow and inefficient query or in a query that exceeds the maximum allowed per-query resources. Such queries will fail with an error. When possible, prefer queries that do not use CROSS JOIN
So, is it possible to do an INNER JOIN with a between, or improve the CROSS JOIN some other way?
Upvotes: 2
Views: 3786
Reputation: 172944
This is a limitation for BigQuery Legacy SQL.
You should use BigQuery Standard SQL instead:
#standardSQL
SELECT *
FROM
`table1` a
INNER JOIN `table2` b
ON a.value BETWEEN b.start AND b.end
In standard SQL - you should use back-ticks instead of brackets.
Also keep in mind that end
is a reserved keyword, so to make above work you need to enclose it in back-ticks also.
See below (along with dummy data from your question):
#standardSQL
WITH table1 AS (
SELECT value
FROM UNNEST([1, 4, 9, 10]) AS value
),
table2 AS (
SELECT chunk.start, chunk.`end`
FROM UNNEST([STRUCT<start INT64, `end` INT64>(0,5),(6,9),(10,15)]) AS chunk
)
SELECT *
FROM `table1` a
INNER JOIN `table2` b
ON a.value BETWEEN b.start AND b.`end`
-- ORDER BY value
Upvotes: 4