ΑΘΩ
ΑΘΩ

Reputation: 121

Advice on changing the partition field for dynamic BigQuery tables

I am dealing with the following issue: I have a number of tables imported into BigQuery from an external source via AirByte with _airbyte_emitted_at as the default setting for partition field.

As this default choice for a partition field is not very lucrative, the need to change the partition field naturally presents itself. I am aware of the method available for changing partitions of existing tables, by means of a CREATE TABLE FROM SELECT * statement, however the new tables thus created - essentially copies of the original ones, with modified partition fields - will be mere static snapshots and no longer dynamically update, as the originals do each time new data is recorded in the external source.

Given such a context, what would the experienced members of this forum suggest as a solution to the problem?

Being that I am a relative beginner in such matters, I apologise in advance for any potential lack of clarity. I look forward to improving the clarity, should there be any suggestions to do so from interested readers & users of this forum.

Upvotes: 0

Views: 309

Answers (1)

Sayan Bhattacharya
Sayan Bhattacharya

Reputation: 1368

I can think of 2 approaches to overcome this.

Approach 1 :

You can use Scheduled queries to copy the newly inserted rows to your 2nd table. You have to write the query in such a way that it will always select the latest rows from your main table and once you have that you can use Insert Into statement to append the rows in your 2nd table.
Since Schedule queries run at specific times the only drawback will be the the 2nd table will not get updated immediately whenever there is a new row in the main table, it will get the latest data whenever the Scheduled Query runs.
If you do not wish to have the latest data always in your 2nd table then this approach is the easier one to achieve.

Approach 2 :

You can trigger Cloud Actions for BigQuery events such as Insert, delete, update etc. Whenever a new row gets inserted in your main table ,using Cloud Run Actions you can insert that new data in your 2nd table. You can follow this article , here a detailed solution has been given.
If you wish to have the latest data always in your 2nd table then this would be a good way to do so.

Upvotes: 2

Related Questions