briler
briler

Reputation: 590

BigQuery streaming best practice

I am using Google BigQuery for sometime now, using upload files, As I get some delays with this method I am now trying to convert my code into streaming.

Looking for best solution here, what is more correct working with BQ: 1. Using multiple (up to 40) different streaming machines ? or directing traffic to single or more endpoints to upload data? 2. Uploading one row at a time or stacking to a list of 100-500 events and uploading it. 3. is streaming the way to go, or stick with files uploading - in terms of high volumes.

some more data: - we are uploading ~ 1500-2500 rows per second. - using .net API. - Need data to be available within ~ 5 minutes

Didn't find such reference elsewhere.

Upvotes: 1

Views: 438

Answers (1)

Mario
Mario

Reputation: 1230

The big difference between streaming data and uploading files is that streaming is intended for live data that is being produced on real time while being streamed, whereas with uploading files, you would upload data that was stored previously.

In your case, I think Streaming makes more sense. If something goes wrong, you would only need to re-send the failed rows, instead of the whole file. And it adapts more to the growing files that I think you're getting.

The best practices in any case are:

  1. Trying to reduce the number of sources that send the data.
  2. Sending bigger chunks of data in each request instead of multiple tiny chunks.
  3. Using exponential back-off to retry those requests that could fail due to server errors (These are common and should be expected).

There are certain limits that apply to Load Jobs as well as to Streaming inserts.

For example, when using streaming you should insert less than 500 rows per request and up to 10,000 rows per second per table.

Upvotes: 3

Related Questions