David542
David542

Reputation: 110502

Using the append model to do partial row updates in BigQuery

Suppose I have the following record in BQ:

id        name           age    timestamp
1         "tom"          20     2019-01-01

I then perform two "updates" on this record by using the streaming API to 'append' additional data -- https://cloud.google.com/bigquery/streaming-data-into-bigquery. This is mainly to get around the update quota that BQ enforces (and it is a high-write application we have).

I then append two edits to the table, one update that just modifies the name, and then one update that just modifies the age. Here are the three records after the updates:

id        name           age    timestamp
1         "tom"          20     2019-01-01
1         "Tom"          null   2019-02-01
1         null           21     2019-03-03

I then want to query this record to get the most "up-to-date" information. Here is how I have started:

SELECT id, **name**, **age**,max(timestamp)
FROM table
GROUP BY id 
-- 1,"Tom",21,2019-03-03

How would I get the correct name and age here? Note that there could be thousands of updates to a record, so I don't want to have to write 1000 case statements, if at all possible.

For various other reasons, I usually won't have all row data at one time, I will only have the RowID + FieldName + FieldValue.

I suppose plan B here is to do a query to get the current data and then add my changes to insert the new row, but I'm hoping there's a way to do this in one go without having to do two queries.

Upvotes: 1

Views: 423

Answers (2)

saifuddin778
saifuddin778

Reputation: 7298

This is a classic case of application of analytic functions in Standard SQL.

Here is how you can achieve your results:

select id, name, age from (
  select id, name, age, ts, rank() over (partition by id order by ts desc) rnk
  from `yourdataset.yourtable`
)
where rnk = 1

This will sub-group your records based id and pick the one with most recent ts (indicating the record most recently added for a given id).

Upvotes: 0

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173191

Below is for BigQuery Standard SQL

#standardSQL
SELECT id, 
  ARRAY_AGG(name IGNORE NULLS ORDER BY ts DESC LIMIT 1)[OFFSET(0)] name,
  ARRAY_AGG(age IGNORE NULLS ORDER BY ts DESC LIMIT 1)[OFFSET(0)] age,
  MAX(ts) ts  
FROM `project.dataset.table`
GROUP BY id   

You can test, play with above using sample data from your question as in below example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 1 id, "tom" name, 20 age, DATE '2019-01-01' ts UNION ALL
  SELECT 1, "Tom", NULL, '2019-02-01' UNION ALL
  SELECT 1, NULL, 21, '2019-03-03' 
)
SELECT id, 
  ARRAY_AGG(name IGNORE NULLS ORDER BY ts DESC LIMIT 1)[OFFSET(0)] name,
  ARRAY_AGG(age IGNORE NULLS ORDER BY ts DESC LIMIT 1)[OFFSET(0)] age,
  MAX(ts) ts  
FROM `project.dataset.table`
GROUP BY id   

with result

Row id  name    age ts   
1   1   Tom 21  2019-03-03  

Upvotes: 4

Related Questions