Performance difference in json data into BigQuery loading methods

Question

What is the performance difference between two JSON loading methods into BigQuery: load_table_from_file(io.StringIO(json_data) vs create_rows_json

The first one loads the file as a whole and the second one streams the data. Does it mean that the first method will be faster to complete, but binary, and the second one slower, but discretionary? Any other concerns? Thanks!

Pentium10 · Accepted Answer

It's for two different logics and they have their own limits.

Load from file is great if you can have your data placed in files. A file can be up to 5TB in size. This load is free. You can query data immediately after completion.
The streaming insert, is great if you have your data in form of events that you can stream to BigQuery. While a streaming insert single request is limited up to 10MB, it can be super parallelized up to 1 Million rows per second, that's a big scale. Streaming rows to BigQuery has it's own cost. You can query data immediately after streaming, but for some copy and export jobs data can be available later up to 90 minutes.

Performance difference in json data into BigQuery loading methods

Answers (1)

Related Questions