Reputation: 1181
The Apache Beam Java SDK has a DynamicDestinations class that allows writing to different big query tables based on the input element. I don't see anything in the Python SDK that looks equivalent. Is there some class that allows writing to dynamically chosen destination tables in the Python SDK?
Upvotes: 4
Views: 1807
Reputation: 181
An experimental write was added to the Beam python SDK in 2.14.0, beam.io.fileio.WriteToFiles
:
my_pcollection | beam.io.fileio.WriteToFiles(
path='/my/file/path',
destination=lambda record: 'avro' if record['type'] == 'A' else 'csv',
sink=lambda dest: AvroSink() if dest == 'avro' else CsvSink(),
file_naming=beam.io.fileio.destination_prefix_naming())
which can be used to write to different files per-record.
There isn't a BigQuerySink
, you would have to create a new class inheriting from beam.io.fileio.FileSink
. More documentation here:
https://beam.apache.org/releases/pydoc/2.14.0/apache_beam.io.fileio.html#dynamic-destinations
And the JIRA issue here:
https://issues.apache.org/jira/browse/BEAM-2857
Upvotes: 4
Reputation: 8178
The Apache Beam Python SDK is still not as advanced as the Java SDK in the sense of variety of features, so it is true that you may see some missing features that are still only available in the Java SDK.
As far as I know, and based on what I can find the Python SDK Beam's BigQuery IO documentation, there is currently no class available to specify dynamic BigQuery destinations, as the one that the Java SDK has available (Dynamic Destinations in Java SDK).
I would suggest that you file a new issue as a feature request in the Apache Beam Jira issue tracker explaining why this feature would be a good addition to the Python SDK, and hopefully the developers will consider this option too.
Upvotes: 4