Reputation: 21
I'm writing a Dataflow streaming pipeline (in Python) that processes emails. The idea is that, when an email arrives, a Pub/Sub message is published triggering the pipeline that retrieves the email and processes it. The content of the Pub/Sub message is useless since I just use it to trigger the pipeline.
I'm having some troubles in this last part. I managed to deploy the pipeline and to connect it to a Pub/Sub topic, but when I try to test it (publishing a message), nothing happens.
I guess I must set a window that "collects" messages and emit them at some point but how should I do that? Is there a way to say "start the pipeline everytime a new Pub/Sub message is received, ignoring its content"?
Thanks in advance!
Upvotes: 1
Views: 999
Reputation: 21
I finally managed to solve my problem. The issue was due to the import of custom pipeline option from a class that I defined for that purpose. This import prevented the pipeline to be triggered. Removing it I finally managed to trigger the pipeline.
For those who may need it, the incriminated import was
from engine.user_options import UserOptions
and the imported class was
import apache_beam as beam
class UserOptions(beam.options.pipeline_options.PipelineOptions):
@classmethod
def _add_argparse_args(cls, parser):
parser.add_value_provider_argument('--env', type=str)
Upvotes: 1
Reputation: 11041
Can you share more information about your pipeline and where the emails are stored?
I would recommend you to look at some of the sample pipelines available in Beam.
If you share more info about your pipeline / code, I can try to iterate on it with you.
Upvotes: 0