user13128577
user13128577

Reputation:

What is the benefit of using google cloud pub/sub service in a streaming pipeline

Can anyone explain what is the benefit of adopting google cloud pub/sub service in a streaming pipeline?

I saw one of the event streaming pipeline example showcased, and it was using pub/sub to ingest the events data before connecting to the google cloud data flow service to transform it. Why does it not connect to the events data directly through data flow?

Thanks.

Upvotes: 1

Views: 436

Answers (1)

Tlaquetzal
Tlaquetzal

Reputation: 2850

Dataflow will need a source to get the data from. If you are using a streaming pipeline you can use different options as a source and each of them will have its own characteristics that may fit your scenario.

With Pub/Sub you can easily publish events using a client library or directly the API to a topic, and it will guarantee at least once delivery of that message.

When you connect it with Dataflow streaming pipeline, you can have a resilient architecture (Pub/Sub will keep sending the message until Dataflow acknowledge that it has processed it) and a near real-time processing. In addition, Dataflow can use Pub/Sub metrics to scale up or down depending on the number of the messages in the backlog.

Finally, Dataflow runner uses an optimized version of the PubSubIO connector which provides additional features. I suggest checking this documentation that describes some of these features.

Upvotes: 3

Related Questions