dendog
dendog

Reputation: 3328

Convert a stream into mini batch for loading into bigquery

I would like to build the following pipeline:

pub/sub --> dataflow --> bigquery

The data is streaming, but I would like to avoid streaming the data directly into BigQuery, therefore I was hoping to batch up small chunks in the dataflow machine and then write them into BQ as a load job when they reach a certain size/time.

I cannot find any examples of how to do this using the python apache beam SDK - only Java.

Upvotes: 1

Views: 1366

Answers (1)

Guillem Xercavins
Guillem Xercavins

Reputation: 7058

This is work in progress. The FILE_LOADS method is only available for batch pipelines (with the use_beam_bq_sink experiments flag, it will be the default one in the future.

However, for streaming pipelines, as seen in the code it will raise a NotImplementedError with message:

File Loads to BigQuery are only supported on Batch pipelines.

There is an open JIRA ticket where you can follow the progress.

Upvotes: 4

Related Questions