Reputation: 19655
As data arrives in my BigQuery, I want to send some of it to another database--a datamart or an operational database that serves real-time dashboards.
How do I do this? Polling the enormous BQ table is too expensive and slow, and I want updates to be frequent--close to real-time.
Strangely, I find little info about streaming from BigQuery.
Upvotes: 2
Views: 335
Reputation: 59165
Polling the enormous BQ table is too expensive and slow
Make sure to partition your data by day, and if you have too much data, cluster it by hour.
There isn't a natural way to stream data out of BigQuery as it arrives, but if you partition and cluster your data appropriately, then scans will be way less costly than doing it from a naive table.
For realtime: Would it be an option to split data to BigQuery and other tools from the pipeline, instead of after it being stored in BQ?
To the comment
"I would rather not alter each of clients to write to two targets, BQ plus PubSub"
Have each client write only to Pub/Sub. Then click-to-deploy a pipeline that writes to BigQuery from Pub/Sub - for the most reliable pipeline. Then other consumers can subscribe to the same Pub/Sub topic that feeds BigQuery.
Upvotes: 2