Materialized View for Latest Rows by ID in a BigQuery Table?

Question

I have a BigQuery table with ~5k unique IDs. Every day new rows are inserted for IDs that may or may not already exist.

We use this query to find the most recent rows:

SELECT t.*
EXCEPT (seqnum),
FROM (SELECT t.*,
             ROW_NUMBER() OVER (PARTITION BY id
                                ORDER BY date_of_data DESC
                               ) as seqnum
      FROM `[project]`.[dataset].[table] t
     ) t
WHERE seqnum = 1

Although we only want the most recent row for each ID, this query must scan the entire table. This query is slower and more expensive every day as the table size grows. Right now, for an 8GB table, the query above creates a 22MB table. We would much rather query the 22MB table if it could stay up-to-date.

Is it possible to create a materialized view that gets the latest rows for each ID?

Is there a better solution than growing tables to infinity?

Other requirements:

Keep historical data (somewhere)
Can't use updates - we would do more than 1,500 per day - https://cloud.google.com/bigquery/quotas

Materialized View for Latest Rows by ID in a BigQuery Table?

Answers (1)

Related Questions