user10332687
user10332687

Reputation:

Handling of pubsub subscribers for distributed longrunning tasks

I am evaluating the use of using pubsub for long-running tasks such as video transcoding, where a particular transcode may take between 2-10 minutes. Is pubsub a good approach for such a task distribution? For example, let's say I have five servers:

- publisher1
- publisher2
- publisher3
- publisher4
- publisher5

And a topic called "videos". Would it be possible to spread out the messages equally across those five servers? What about when servers are added or removed? What would be a good approach to doing this, or is pubsub not the right tool for something like this?

Upvotes: 0

Views: 517

Answers (1)

Daniel Collins
Daniel Collins

Reputation: 812

This does sound like a reasonable use case for pubsub. Specifically, if you use a pull subscriber, you can configure flow control settings to have at most one outstanding message to your server, and configure the max ack extension period (in java) to be a reasonable upper bound of your processing time. This api is described here http://googleapis.github.io/google-cloud-java/google-cloud-clients/apidocs/index.html?com/google/cloud/pubsub/v1/package-summary.html

This should effectively load balance across your servers by default if you use the same subscriber id for all jobs. If a server is added and backlog exists, it will receive a new entry. If a server is removed, it will no longer be sent messages. If it removed while processing or crashes, the message it was working on will be resent to another server.

One concern however is that pubsub has a limit of 10MB per message. You might consider instead putting the data itself in a google cloud storage bucket. Cloud storage can publish the file location to a pubsub topic when an upload is complete. https://cloud.google.com/storage/docs/pubsub-notifications

Upvotes: 1

Related Questions