Bigquery : Frequent Updates to a record

Question

We are planning to use bigquery for analytical purpose for our inventory system. Since this is inventory, a record of storeid-productid combination is going to change very often. In terms of volume, the total store-product records are somewhere between 200M - 400M. In total 500K mutations per day are expected. The mutations are coming in kafka topics.

From cost standpoint, what's the optimum solution. Options are

A kafka listener issues a DML statement. UPDATE inventory SET quantity=? WHERE productid=? AND storeid=?. => My assessment on this option is, This is simplest of all, but may incur higher cost because bigquery doesn't have a notion of primary key. Will search index/ clustering etc help?
Have a staging table where we store every mutation, then periodically, using MERGE update the main\reporting table

Something like this https://cloud.google.com/blog/products/bigquery/performing-large-scale-mutations-in-bigquery (However this is a 2018 article, things might have changed a lot - for example, I think the 3 hour lag mentioned here is now 30 minutes)

MERGE dataset.Inventory T
USING dataset.inventory_staging S
ON T.ProductID = S.ProductID and T.storeid = S.storeid
WHEN MATCHED THEN
  UPDATE SET quantity = s.quantity
WHEN NOT MATCHED THEN
  INSERT (ProductID, quantity) VALUES (ProductID, quantity)

Now the second question, if we are to take the second approach,

What's the cost effective way to sink a kafka topic to big query. Does Kafka -> GCS -> BQ give any advantage over streaming solutions (like a boilerplate kafkalistener that does https://cloud.google.com/bigquery/docs/write-api#write-api-overview)

Daniel Aranda · Accepted Answer

Running one UPDATE statement per item would be crazy expensive, you need to have the stage table and run periodical MERGEs.

Kafka -> GCS -> BQ is the most cost effective way.

As additional suggestion you may explore creating a topic in Pub/Sub that replaces kafka. Also Pub / Sub has direct ingestion to bigquery.

Bigquery : Frequent Updates to a record

Answers (2)

Related Questions