Johnny V
Johnny V

Reputation: 1238

BigQuery: very slow join on date and _table_suffix

In my query I wish to get data from a partitioned table, where each user requires a different partition depending on a date particular to the user. This requires me to join on user and date = partition. This join is very, very slow. Taking 12 minutes or so to run!

The basic structure of the query is below:

SELECT
 t1.user_id
 ,t1.date
 ,t2.some_field
FROM `t1`   AS t1
JOIN `t2_*` AS t2
 ON t1.user_id = t2.user_id
 AND t2._table_suffix = FORMAT_DATE("%Y%m%d", t1.date)

How can I get the same result quicker?

Upvotes: 0

Views: 564

Answers (1)

Johnny V
Johnny V

Reputation: 1238

Some things that sped this query up (from 12 mins to 1 min):

  1. Pre filtering the t2 table to the partitions relevant to the join with t1
  2. Adding a date field to the CTE of t2 to make the join with t1 quicker

This looks like so:

WITH t1_dates AS (
 SELECT
 MIN(date) min_date
 ,MAX(date max_date
 FROM
 `t1`
)

, t2_v2 AS (
 SELECT
 user_id
 ,some_field
 ,PARSE_DATE("%Y%m%d", t2._table_suffix) date

 from t2_*
 ,t1_dates

 where _table_suffix BETWEEN FORMAT_DATE("%Y%m%d",t1_dates.min_date) 
                         AND FORMAT_DATE("%Y%m%d",t1_dates.max_date) 
)

SELECT
 t1.user_id
 ,t1.date
 ,t2.some_field
FROM `t1`   AS t1
JOIN t2_v2 AS t2
 ON t1.user_id = t2.user_id
 AND t2.date = t1.date

Upvotes: 1

Related Questions