Reputation: 915
Has anyone had experience with:
Can someone provide info on latency?
Thanks
Upvotes: 2
Views: 907
Reputation: 17913
In question 1, I believe you're interested in BigQuery ingestion latency. Per Streaming Data into BigQuery, Streamed data is available for real-time analysis within a few seconds of the first streaming insertion into a table. This latency is low, but it will probably dominate whatever latency you have due to raw network communication from an Amazon cluster to BigQuery API.
In question 2, you're probably interested in the latency of Dataflow itself - assuming data arrives into a Dataflow streaming pipeline, e.g. via PubSub, at real time, and you're processing it and ultimately writing to Amazon, and you're interested in how quickly the results come back.
This depends highly on the windowing structure of your pipeline (e.g., if you window data into 5-minute windows, data will be buffered accordingly). If you don't do any windowing at all, latency introduced by Dataflow itself should be low (sub-second). For details of how that is achieved, you can consult the MillWheel paper on which Dataflow's streaming engine is based.
Upvotes: 1