Reputation: 1094
Is it possible to set BigQuery JobID or to get it while the batch pipeline is running.
I know it's possible using BigQuery API but is it possible if I'm using BigQueryIO from Apache Beam?
I need to send an acknowledgement after writing to BigQuery that the load is complete.
Upvotes: 1
Views: 545
Reputation: 17913
Currently this is not possible. It is complicated by the fact that a single BigQueryIO.write()
may use many BigQuery jobs under the hood (i.e. BigQueryIO.write()
is a general-purpose API for writing data to BigQuery, rather than an API for working with a single specific BigQuery load job), e.g.:
BigQueryIO.write()
will shard it into multiple load jobs.DynamicDestinations
), and are loading into multiple tables at the same time, there'll be at least 1 load job per table.PCollection
using the BATCH_LOADS
method, it will periodically issue load jobs for newly arrived data, subject to the notes above.STREAMING_INSERTS
method (it is allowed to use it even if you're writing a bounded PCollection
), there will be no load jobs at all.You will need to use one of the typical workarounds for "doing something after something else is done", which is, e.g. wait until the entire pipeline is done using pipeline.run().waitUntilFinish()
in your main program and then do your second action.
Upvotes: 5