Using the append model to do partial row updates in BigQuery

Question

Suppose I have the following record in BQ:

id        name           age    timestamp
1         "tom"          20     2019-01-01

I then perform two "updates" on this record by using the streaming API to 'append' additional data -- https://cloud.google.com/bigquery/streaming-data-into-bigquery. This is mainly to get around the update quota that BQ enforces (and it is a high-write application we have).

I then append two edits to the table, one update that just modifies the name, and then one update that just modifies the age. Here are the three records after the updates:

id        name           age    timestamp
1         "tom"          20     2019-01-01
1         "Tom"          null   2019-02-01
1         null           21     2019-03-03

I then want to query this record to get the most "up-to-date" information. Here is how I have started:

SELECT id, **name**, **age**,max(timestamp)
FROM table
GROUP BY id 
-- 1,"Tom",21,2019-03-03

How would I get the correct name and age here? Note that there could be thousands of updates to a record, so I don't want to have to write 1000 case statements, if at all possible.

For various other reasons, I usually won't have all row data at one time, I will only have the RowID + FieldName + FieldValue.

I suppose plan B here is to do a query to get the current data and then add my changes to insert the new row, but I'm hoping there's a way to do this in one go without having to do two queries.

Mikhail Berlyant · Accepted Answer

Below is for BigQuery Standard SQL

#standardSQL
SELECT id, 
  ARRAY_AGG(name IGNORE NULLS ORDER BY ts DESC LIMIT 1)[OFFSET(0)] name,
  ARRAY_AGG(age IGNORE NULLS ORDER BY ts DESC LIMIT 1)[OFFSET(0)] age,
  MAX(ts) ts  
FROM `project.dataset.table`
GROUP BY id

You can test, play with above using sample data from your question as in below example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 id, "tom" name, 20 age, DATE '2019-01-01' ts UNION ALL
  SELECT 1, "Tom", NULL, '2019-02-01' UNION ALL
  SELECT 1, NULL, 21, '2019-03-03' 
)
SELECT id, 
  ARRAY_AGG(name IGNORE NULLS ORDER BY ts DESC LIMIT 1)[OFFSET(0)] name,
  ARRAY_AGG(age IGNORE NULLS ORDER BY ts DESC LIMIT 1)[OFFSET(0)] age,
  MAX(ts) ts  
FROM `project.dataset.table`
GROUP BY id

with result

Row id  name    age ts   
1   1   Tom 21  2019-03-03

Using the append model to do partial row updates in BigQuery

Answers (2)

Related Questions