Reputation: 19
I'm working on an application where I will be getting 40 million records in a day so will the PubSub can handle it?. I have also seen that in some cases PubSub sends duplicate messages how can we avoid this?
Upvotes: 0
Views: 833
Reputation: 17261
40 million records in a day (~460/s) is definition feasible for Pub/Sub, yes. The service is designed to scale horizontally with your load to tens of GB per second. Pub/Sub is an at-least-once delivery service by default, which means that duplicates are possible. There is an exactly once feature currently in public preview, which allows one to get stronger guarantees including:
This does mean that if you don't ack a message before the deadline, the message will get redelivered, so it doesn't mean you avoid duplicates entirely. If you need exactly once processing, then Dataflow can be a good choice.
Upvotes: 2