Reputation: 31
I want to append new data from table_a
to table_b
every day by using a scheduled query in Google Big Query. See example below.
However, when I write the SELECT
statement which appends to table_b
the query size gets really large.
If I compare the dynamic query to the one where I use a fixed date (here: 30.05.2020
) the query size tenfolds:
SELECT
date_a as date_b,
col1,
col2,
col3
FROM `project.dataset.table_a`
WHERE
date_a > (select max(date_b) from `project.dataset.table_b`)
ORDER by date_b
table_a(partioned on date_a
; total size over 200GB) (this example is simplified as I am doing other JOINs
and UNIONs
etc. in the query)
+------------+---------+---------+------+--------------+--------------+
| date_a | col1 | col2 | col3 | others | others |
+------------+---------+---------+------+--------------+--------------+
| 27.05.2020 | henry | muller | 100$ | not relevant | not relevant |
| 28.05.2020 | jamie | fox | 200$ | not relevant | not relevant |
| 29.05.2020 | richard | branson | 20$ | not relevant | not relevant |
| 30.05.2020 | jannet | jackson | 50$ | not relevant | not relevant |
| 31.05.2020 | michael | jackson | 90$ | not relevant | not relevant |
+------------+---------+---------+------+--------------+--------------+
table_b (not partioned, total size less than 50MB)
+------------+---------+---------+------+
| date_b | col1 | col2 | col3 |
+------------+---------+---------+------+
| 27.05.2020 | henry | muller | 100$ |
| 28.05.2020 | jamie | fox | 200$ |
| 29.05.2020 | richard | branson | 20$ |
| 30.05.2020 | jannet | jackson | 50$ |
+------------+---------+---------+------+
Question:
Have you got a tip for me how I can read out the MAX(date_b) before or another query-reducing suggestion?
Upvotes: 0
Views: 438
Reputation: 31
My problem was addressed & answered in another thread I didn't see before.
Here is the other question: How to choose the latest partition in BigQuery table?
Here is the 'official' answer/ help from Google on this matter: https://cloud.google.com/bigquery/docs/querying-partitioned-tables#pseudo_column_queries_that_scan_all_partitions
Upvotes: 1