FZF
FZF

Reputation: 915

What is the streaming log data latency between AWS & Google cloud services?

Has anyone had experience with:

  1. Sending streamed/micro-batched log data from Amazon to BigQuery to process and can shed light on any latency issue?
  2. Sending (micro-batched) logs from Google DataFlow to Amazon (Kinesis / S3 / DynamoDB)

Can someone provide info on latency?

Thanks

Upvotes: 2

Views: 907

Answers (1)

jkff
jkff

Reputation: 17913

In question 1, I believe you're interested in BigQuery ingestion latency. Per Streaming Data into BigQuery, Streamed data is available for real-time analysis within a few seconds of the first streaming insertion into a table. This latency is low, but it will probably dominate whatever latency you have due to raw network communication from an Amazon cluster to BigQuery API.

In question 2, you're probably interested in the latency of Dataflow itself - assuming data arrives into a Dataflow streaming pipeline, e.g. via PubSub, at real time, and you're processing it and ultimately writing to Amazon, and you're interested in how quickly the results come back.

This depends highly on the windowing structure of your pipeline (e.g., if you window data into 5-minute windows, data will be buffered accordingly). If you don't do any windowing at all, latency introduced by Dataflow itself should be low (sub-second). For details of how that is achieved, you can consult the MillWheel paper on which Dataflow's streaming engine is based.

Upvotes: 1

Related Questions