Reputation: 3328
I would like to build the following pipeline:
pub/sub --> dataflow --> bigquery
The data is streaming, but I would like to avoid streaming the data directly into BigQuery, therefore I was hoping to batch up small chunks in the dataflow machine and then write them into BQ as a load job when they reach a certain size/time.
I cannot find any examples of how to do this using the python apache beam SDK - only Java.
Upvotes: 1
Views: 1366
Reputation: 7058
This is work in progress. The FILE_LOADS
method is only available for batch pipelines (with the use_beam_bq_sink
experiments flag, it will be the default one in the future.
However, for streaming pipelines, as seen in the code it will raise a NotImplementedError
with message:
File Loads to BigQuery are only supported on Batch pipelines.
There is an open JIRA ticket where you can follow the progress.
Upvotes: 4