Reputation: 714
I have a pub/sub topic, which gets message as soon as a file is created in the bucket, with the streaming pipeline, I am able to get the object path. Created file is AVRO. Now in my pipeline I want to read all the content of the different files, that was captured in the pub/sub messages in the pvs step.
Any direction to the solution will be appriciated, using given functions in apache_beam.
Using python sdk
Upvotes: 1
Views: 384
Reputation: 714
Was able to do this using ReadAllFromAvro() Ref: https://beam.apache.org/releases/pydoc/2.29.0/apache_beam.io.avroio.html?highlight=readallfromavro#apache_beam.io.avroio.ReadAllFromAvro
with beam.Pipeline(options=self.pipeline_options) as pipeline:
(
pipeline
| "Read from Pub/Sub" >> beam.io.MultipleReadFromPubSub(pubsub_subscribers, with_attributes=True)
| "Windowing Transformation" >> GroupMessagesByFixedWindowsTransform(
self.streamverse_options.window_size)
| "Read From GCS" >> beam.io.ReadAllFromAvro()
| beam.Map(print)
)
Upvotes: 1