Reputation: 85
I'm storing different documents in index. Some workers are searching for needed documents in this index and using them for their own logic. The thing is that I want to mark found documents as used by worker (or delete them entirely) but the issue is that multiple workers might get the same document because workers ask for document concurrently.
Can this issue be solved by Elasticsearch, or may be I need to implement locking/syncing on my side? For example if 2 workers ask for latest 20 documents I need some way to return different 20 documents for each worker.
Upvotes: 0
Views: 311
Reputation: 10389
Elasticsearch provides update operation with optimistic locking support. So you could make a search and get a list of documents, then for each document try to lock it. The exact params that need to be passed to the update API will be different based on the Elasticsearch version,
and you could use an additional attribute like locked: true
, which will allow the workers to ignore locked documents when making the initial search. The update could either succeed or fail depends on whether it's locked by other workers in the meantime. If it fails, just ignore those document.
How well this approach will work depends on the number of workers and the contention that would arise when they try to lock the same document. At the end of the day, Elasticsearch is not a queuing system and might not be optimized for these kinds of use cases.
You might also be interested in Percolate Query, which reverses the condition. Instead of searching which documents match a particular query, your workers could register a set of queries and then when indexing a document, just issue a percolate query and see if the document matches any registered queries and push the document to worker queue if it matches. With this approach, Elasticsearch is only used for search. Distribution of jobs across the workers will be handled by the worker queue.
Upvotes: 2