Reputation: 597016
We need to process tasks that take a lot of time (parsing huge xml files and inserting the data into db) by multiple nodes. The nodes will not be many, we are even going to start with a single node.
The files are going to be read from an FTP. The job will be scheduled and will happen once a day. What would be a good and easy way to distribute the processing?
My current draft thoughts are:
ConcurrentMap
- it handles the synchronization of the map behind the scene.putIfAbsent(..)
- if the file is not in the map, process it. If it is in the map, it means another node is processing it, so try the next file.That way:
.putIfAbsent(..)
call and the underlying synchronization (shuffling is meant to improve this as well)I'm not sure if this is the best approach though. Is it OK? What can be improved? Is there a better one?
Upvotes: 3
Views: 256
Reputation: 533442
Based on your comments I would suggest considering using JMS, (like ActiveMQ which I found the simplest to use/develop with)
It can be stand-alone, redundant and/or embedded.
You can add messages to a Queue and consume from any number of nodes. With auto-commit turned off a failing nodes messages are returned to the queue automatically.
Upvotes: 2