Joshua Fox
Joshua Fox

Reputation: 19655

How do I stream updates from BigQuery?

As data arrives in my BigQuery, I want to send some of it to another database--a datamart or an operational database that serves real-time dashboards.

How do I do this? Polling the enormous BQ table is too expensive and slow, and I want updates to be frequent--close to real-time.

Strangely, I find little info about streaming from BigQuery.

Upvotes: 2

Views: 335

Answers (1)

Felipe Hoffa
Felipe Hoffa

Reputation: 59165

Polling the enormous BQ table is too expensive and slow

Make sure to partition your data by day, and if you have too much data, cluster it by hour.

There isn't a natural way to stream data out of BigQuery as it arrives, but if you partition and cluster your data appropriately, then scans will be way less costly than doing it from a naive table.

For realtime: Would it be an option to split data to BigQuery and other tools from the pipeline, instead of after it being stored in BQ?


To the comment

"I would rather not alter each of clients to write to two targets, BQ plus PubSub"

Have each client write only to Pub/Sub. Then click-to-deploy a pipeline that writes to BigQuery from Pub/Sub - for the most reliable pipeline. Then other consumers can subscribe to the same Pub/Sub topic that feeds BigQuery.

Upvotes: 2

Related Questions