bigquery : data versioning and incremental updates

Question

Is there any documentation on how to deal with APPEND only database such as BigQuery, for real time analytics use (e.g. 100 million to 1 billion mobile user profile which are updated daily), and to minimize down time to a few seconds.

(1) There is a post http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/ on how to implement incremental updates, and I am wondering if similar solution exist in BigQuery.

If I need to attach a version column (e.g. version 1, 2, 3) etc, is there a quick way to calculate the version?

(2) How to minimize downtime to a few seconds I like the solution mentioned in Eliminating duplicate records in a BigQuery Table and I think it solves my challenge partially.

If mydataset.mytable has 100 million to 1 billion unique e.g. mobile user records, and downtime of less than a few seconds is not tolerable, is there anyway to eliminate the downtime? e.g. Virtual view? During the "bq query --replace" and "bq query cp", can the table mydataset.mytable accessible, or will the table be locked until the replace and query is completed. Thank you.

bq query --allow_large_results --replace --destination_table=mydataset.mytable \
'SELECT * FROM mydataset.mytable
 WHERE key NOT IN (SELECT key FROM mydataset.update)'

bq cp --append_table mydataset.update mydataset.mytable

bigquery : data versioning and incremental updates

Answers (1)

Related Questions