user9984628
user9984628

Reputation: 53

Querying a Partitioned table in BigQuery using a reference from a joined table

I would like to run a query that partitions table A using a value from table B. For example:

#standard SQL
select A.user_id
from my_project.xxx A
inner join my_project.yyy B
on A._partitiontime = timestamp(B.date)
where B.date = '2018-01-01'

This query will scan all the partitions in table A and will not take into consideration the date I specified in the where clause (for partitioning purposes). I have tried running this query in several different ways but all produced the same result - scanning all partitions in table A. Is there any way around it?

Thanks in advance.

Upvotes: 5

Views: 2781

Answers (2)

Yun Zhang
Yun Zhang

Reputation: 5503

With BigQuery scripting (Beta now), there is a way to prune the partitions.

Basically, a scripting variable is defined to capture the dynamic part of a subquery. Then in subsequent query, scripting variable is used as a filter to prune the partitions to be scanned.

DECLARE date_filter ARRAY<DATETIME> 
  DEFAULT (SELECT ARRAY_AGG(date) FROM B WHERE ...);

select A.user_id
from my_project.xxx A
inner join my_project.yyy B
on A._partitiontime = timestamp(B.date)
where A._partitiontime IN UNNEST(date_filter)

Upvotes: 3

Pentium10
Pentium10

Reputation: 207912

The doc says this about your use case:

Express the predicate filter as closely as possible to the table identifier. Complex queries that require the evaluation of multiple stages of a query in order to resolve the predicate (such as inner queries or subqueries) will not prune partitions from the query.

The following query does not prune partitions (note the use of a subquery):

#standardSQL
SELECT
  t1.name,
  t2.category
FROM
  table1 t1
INNER JOIN
  table2 t2
ON
  t1.id_field = t2.field2
WHERE
  t1.ts = (SELECT timestamp from table3 where key = 2)

Upvotes: 1

Related Questions