Akshay Lande
Akshay Lande

Reputation: 87

which is better BigqueryIO.write() Or bigquery.insertAll() method for dataflow

I am developing java code to read records from GCS and insert into BQ tables which is better BigqueryIO.write() Or bigquery.insertAll() method from cost and performance perspective

Upvotes: 1

Views: 922

Answers (2)

Felipe Hoffa
Felipe Hoffa

Reputation: 59175

If you are using Dataflow, your preferred method should be using Beam's BigQueryIO - this class has a lot of knowledge encapsulated on the best way to handle errors and different methods to send data to BigQuery.

The 2 methods you can choose with BigQueryIO.Write:

FILE_LOADS:

Use BigQuery load jobs to insert data. Records will first be written to files, and these files will be loaded into BigQuery. This is the default method when the input is bounded. This method can be chosen for unbounded inputs as well, as long as a triggering frequency is also set using BigQueryIO.Write.withTriggeringFrequency. BigQuery has daily quotas on the number of load jobs allowed per day, so be careful not to set the triggering frequency too frequent. For more information, see Loading Data from Cloud Storage.

STREAMING_INSERTS:

Use the BigQuery streaming insert API to insert data. This provides the lowest-latency insert path into BigQuery, and therefore is the default method when the input is unbounded. BigQuery will make a strong effort to ensure no duplicates when using this path, however there are some scenarios in which BigQuery is unable to make this guarantee. A query can be run over the output table to periodically clean these rare duplicates. Alternatively, using the FILE_LOADS insert method does guarantee no duplicates, though the latency for the insert into BigQuery will be much higher. For more information, see Streaming Data into BigQuery.

Upvotes: 1

David
David

Reputation: 9721

BigQueryIO is preferable because it is part of Beam, and so the pipeline understands records being sent to BigQuery. This means that it can be monitored, retries are builtin etc. BigQueryIO.Write actually allows you to choose whether to use a load job or streaming inserts via the withMethod setting.

Upvotes: 0

Related Questions