Reputation: 242
I have an REST API which is composed by different resources. Some of those resources are also indexed and kept in sync in ES and I'm implementing a queue system to manage these operations in an async way. I decided to go for Beanstalkd as a queue system.
My tought
For each resource I will have a different tube and I will split indexing job by resources. For example I will have tubes like "index_users", "index_posts" which will receive jobs with the ids of resources to index in ES:
->useTube('index_users')->put( json_encode( [ 'ids' => [ 33, 35, 66 ] ] ) );
Have different tube for different resources helps me keeping things separated ( for example I can decide to stop indexing users just deleting the tube index_users ), job will be analyzed faster because there will be less amount of jobs per queue and a huge amount of indexing operations on one resource will not effect indexing other resources
My questions
Upvotes: 0
Views: 197
Reputation: 208002
This is a good way to proceed. I would place only one id per message, and not several ids. As if the job fails, then you can retry only that one. You have better control when you have only one id per message.
Beanstalkd is fast, it will work fine with your numbers. You can easily go with multiple tubes. It's even better as you can setup the number of workers based on the number of messages and rate from one to the other.
If you need a good admin interface for Beanstalkd you can try out https://github.com/ptrofimov/beanstalk_console
On the other hand look into bulk operation in ElasticSearch, and if that's something you would take advatange of, then you need to place multiple IDs on the tube to be able to bulk index it.
Upvotes: 0