Reputation: 17119
I am looking for a Python library that can distribute the tasks across a few servers. The task would be similar to what can be parallelized by the subprocess
library in a single machine.
I know that I can setup a Hadoop system for such purposes. However Hadoop is heavy weight. In my case, I would like to use a shared network disk for data I/O, and I don't need any fancy failure recover. In MapReduce's terminology, I only need mappers, no aggregators or reducers.
Any such library in Python? Thanks!
Upvotes: 1
Views: 194
Reputation: 63762
Try using celery.
Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.
The execution units, called tasks, are executed concurrently on a single or more worker servers using multiprocessing, Eventlet, or gevent. Tasks can execute asynchronously (in the background) or synchronously (wait until ready).
Upvotes: 3