Reputation: 1697
I am trying to work out a large program using Map-Reduce framework which needs the entire process be split into three Map-Reduce jobs which should happen sequentially.
I am using mincemeat.py because I read in many places that it is faster than octo.py and other framework implementations in python.
But I am not able to chain the multiple jobs because each client needs to give a password and get connected to server for execution. My idea is that by starting client all the jobs should run in sequence. I am a newbie in python. Appreciate if someone can help me in this regard.
Below is the code that starts a job, for example wordCount here..
s = mincemeat.Server()
s.datasource = datasource
s.mapfn = map_wordCount
s.reducefn = reduce_wordCount
wordCounts = s.run_server(password="password")
print wordCounts
I want another job's map and reduce functions to be called without the requirement for a separate client invocation of it. Anyone with pointers to how this can be done.
Thanks.
Upvotes: 3
Views: 466
Reputation: 35247
Can you just not use map
for a Pool
of workers that will launch a batch of jobs whose goal is to launch another Pool
of workers each running map
-reduce jobs? I never heard of mincemeat.py
, but I do this with the pathos
framework… which provides a Pool
with a blocking map
, iterative imap
, and asynchronous amap
(as well as pipes
) for backends such as multiprocessing
, threading
, mpi4py
, parallel python
(socket-based distributed parallel computing), and ssh-tunneling.
This has the overhead of whatever backend or backends you choose, so for very small tasks you see a good bit of the time as overhead, but for anything larger the nested distributed parallel computing is a win.
You can find pathos
(and pyina
-- the mpi4py
portion of pathos
) here: https://github.com/uqfoundation
Upvotes: 1