Jan
Jan

Reputation: 31

Decrease query size: Append new data to Big Query table based on latest date

I want to append new data from table_a to table_b every day by using a scheduled query in Google Big Query. See example below.

However, when I write the SELECT statement which appends to table_b the query size gets really large.

If I compare the dynamic query to the one where I use a fixed date (here: 30.05.2020) the query size tenfolds:

SELECT
date_a as date_b,
col1,
col2,
col3 
FROM `project.dataset.table_a`
WHERE
  date_a > (select max(date_b) from `project.dataset.table_b`)
ORDER by date_b

table_a(partioned on date_a; total size over 200GB) (this example is simplified as I am doing other JOINs and UNIONs etc. in the query)

+------------+---------+---------+------+--------------+--------------+
|    date_a  |  col1   |  col2   | col3 |    others    |    others    |
+------------+---------+---------+------+--------------+--------------+
| 27.05.2020 | henry   | muller  | 100$ | not relevant | not relevant |
| 28.05.2020 | jamie   | fox     | 200$ | not relevant | not relevant |
| 29.05.2020 | richard | branson | 20$  | not relevant | not relevant |
| 30.05.2020 | jannet  | jackson | 50$  | not relevant | not relevant |
| 31.05.2020 | michael | jackson | 90$  | not relevant | not relevant |
+------------+---------+---------+------+--------------+--------------+

table_b (not partioned, total size less than 50MB)

+------------+---------+---------+------+
|    date_b  |  col1   |  col2   | col3 |
+------------+---------+---------+------+
| 27.05.2020 | henry   | muller  | 100$ |
| 28.05.2020 | jamie   | fox     | 200$ |
| 29.05.2020 | richard | branson | 20$  |
| 30.05.2020 | jannet  | jackson | 50$  |
+------------+---------+---------+------+

Question:

Have you got a tip for me how I can read out the MAX(date_b) before or another query-reducing suggestion?

Upvotes: 0

Views: 438

Answers (1)

Jan
Jan

Reputation: 31

My problem was addressed & answered in another thread I didn't see before.

Here is the other question: How to choose the latest partition in BigQuery table?

Here is the 'official' answer/ help from Google on this matter: https://cloud.google.com/bigquery/docs/querying-partitioned-tables#pseudo_column_queries_that_scan_all_partitions

Upvotes: 1

Related Questions