hamdog
hamdog

Reputation: 1181

Apache Beam DynamicDestinations Python equivalent

The Apache Beam Java SDK has a DynamicDestinations class that allows writing to different big query tables based on the input element. I don't see anything in the Python SDK that looks equivalent. Is there some class that allows writing to dynamically chosen destination tables in the Python SDK?

Upvotes: 4

Views: 1807

Answers (2)

anrope
anrope

Reputation: 181

An experimental write was added to the Beam python SDK in 2.14.0, beam.io.fileio.WriteToFiles:

my_pcollection | beam.io.fileio.WriteToFiles(
      path='/my/file/path',
      destination=lambda record: 'avro' if record['type'] == 'A' else 'csv',
      sink=lambda dest: AvroSink() if dest == 'avro' else CsvSink(),
      file_naming=beam.io.fileio.destination_prefix_naming())

which can be used to write to different files per-record.

There isn't a BigQuerySink, you would have to create a new class inheriting from beam.io.fileio.FileSink. More documentation here:

https://beam.apache.org/releases/pydoc/2.14.0/apache_beam.io.fileio.html#dynamic-destinations

And the JIRA issue here:

https://issues.apache.org/jira/browse/BEAM-2857

Upvotes: 4

dsesto
dsesto

Reputation: 8178

The Apache Beam Python SDK is still not as advanced as the Java SDK in the sense of variety of features, so it is true that you may see some missing features that are still only available in the Java SDK.

As far as I know, and based on what I can find the Python SDK Beam's BigQuery IO documentation, there is currently no class available to specify dynamic BigQuery destinations, as the one that the Java SDK has available (Dynamic Destinations in Java SDK).

I would suggest that you file a new issue as a feature request in the Apache Beam Jira issue tracker explaining why this feature would be a good addition to the Python SDK, and hopefully the developers will consider this option too.

Upvotes: 4

Related Questions