Gravma
Gravma

Reputation: 21

Connect Pub/Sub to Dataflow Python pipeline

I'm writing a Dataflow streaming pipeline (in Python) that processes emails. The idea is that, when an email arrives, a Pub/Sub message is published triggering the pipeline that retrieves the email and processes it. The content of the Pub/Sub message is useless since I just use it to trigger the pipeline.

I'm having some troubles in this last part. I managed to deploy the pipeline and to connect it to a Pub/Sub topic, but when I try to test it (publishing a message), nothing happens.

I guess I must set a window that "collects" messages and emit them at some point but how should I do that? Is there a way to say "start the pipeline everytime a new Pub/Sub message is received, ignoring its content"?

Thanks in advance!

Upvotes: 1

Views: 999

Answers (2)

Gravma
Gravma

Reputation: 21

I finally managed to solve my problem. The issue was due to the import of custom pipeline option from a class that I defined for that purpose. This import prevented the pipeline to be triggered. Removing it I finally managed to trigger the pipeline.

For those who may need it, the incriminated import was

from engine.user_options import UserOptions

and the imported class was

import apache_beam as beam


class UserOptions(beam.options.pipeline_options.PipelineOptions):
    @classmethod
    def _add_argparse_args(cls, parser):
        parser.add_value_provider_argument('--env', type=str)

Upvotes: 1

Pablo
Pablo

Reputation: 11041

Can you share more information about your pipeline and where the emails are stored?

I would recommend you to look at some of the sample pipelines available in Beam.

If you share more info about your pipeline / code, I can try to iterate on it with you.

Upvotes: 0

Related Questions