Reputation: 590
I am using Google BigQuery for sometime now, using upload files, As I get some delays with this method I am now trying to convert my code into streaming.
Looking for best solution here, what is more correct working with BQ: 1. Using multiple (up to 40) different streaming machines ? or directing traffic to single or more endpoints to upload data? 2. Uploading one row at a time or stacking to a list of 100-500 events and uploading it. 3. is streaming the way to go, or stick with files uploading - in terms of high volumes.
some more data: - we are uploading ~ 1500-2500 rows per second. - using .net API. - Need data to be available within ~ 5 minutes
Didn't find such reference elsewhere.
Upvotes: 1
Views: 438
Reputation: 1230
The big difference between streaming data and uploading files is that streaming is intended for live data that is being produced on real time while being streamed, whereas with uploading files, you would upload data that was stored previously.
In your case, I think Streaming makes more sense. If something goes wrong, you would only need to re-send the failed rows, instead of the whole file. And it adapts more to the growing files that I think you're getting.
The best practices in any case are:
There are certain limits that apply to Load Jobs as well as to Streaming inserts.
For example, when using streaming you should insert less than 500 rows per request and up to 10,000 rows per second per table.
Upvotes: 3