Reputation: 691
I have a BigQuery table which gets updated with rows by a batch job every one hour. The job appends rows that are modified in an external system and would be in the format below.
Supplier_ID | Supplier_Name | Supplier_Contact | Last_Modified |
---|---|---|---|
123 | ABC | 03 483 394 | 2023-05-01 12:34:56 |
124 | ABD | 02 848 939 | 2023-05-01 12:34:56 |
123 | ABC | 03 483 345 | 2023-05-02 10:45:05 |
124 | ABD | 02 848 837 | 2023-05-02 10:45:05 |
123 | ABC | 03 478 102 | 2023-05-08 11:09:15 |
I want to create a current version of the table like below which shows the latest record of all the Suppliers with only one record per Supplier_ID.
Supplier_ID | Supplier_Name | Supplier_Contact | Last_Modified |
---|---|---|---|
123 | ABC | 03 478 102 | 2023-05-08 11:09:15 |
124 | ABD | 02 848 837 | 2023-05-02 10:45:05 |
The size if the table would be in millions. What would be the best way to achieve this? I don't want to create a view since the size of the table would keep growing and the systems which would consume this new table needs a faster response time.
Option 1: Run another batch job and create the table having the max of the timestamp for each Supplier_ID. But the drawback would be creating the entire table even if only one record got updated.
Option 2: Creating materialized views. I haven't worked on this till now. But if this is better then I would explore this.
Looking for your suggestions on the best way.
Upvotes: 0
Views: 81