kausal_malladi
kausal_malladi

Reputation: 1697

Chaining Multiple Map-Reduce jobs using Mincemeat.py

I am trying to work out a large program using Map-Reduce framework which needs the entire process be split into three Map-Reduce jobs which should happen sequentially.

I am using mincemeat.py because I read in many places that it is faster than octo.py and other framework implementations in python.

But I am not able to chain the multiple jobs because each client needs to give a password and get connected to server for execution. My idea is that by starting client all the jobs should run in sequence. I am a newbie in python. Appreciate if someone can help me in this regard.

Below is the code that starts a job, for example wordCount here..

s = mincemeat.Server()
s.datasource = datasource
s.mapfn = map_wordCount
s.reducefn = reduce_wordCount
wordCounts = s.run_server(password="password")
print wordCounts

I want another job's map and reduce functions to be called without the requirement for a separate client invocation of it. Anyone with pointers to how this can be done.

Thanks.

Upvotes: 3

Views: 466

Answers (1)

Mike McKerns
Mike McKerns

Reputation: 35247

Can you just not use map for a Pool of workers that will launch a batch of jobs whose goal is to launch another Pool of workers each running map-reduce jobs? I never heard of mincemeat.py, but I do this with the pathos framework… which provides a Pool with a blocking map, iterative imap, and asynchronous amap (as well as pipes) for backends such as multiprocessing, threading, mpi4py, parallel python (socket-based distributed parallel computing), and ssh-tunneling.

This has the overhead of whatever backend or backends you choose, so for very small tasks you see a good bit of the time as overhead, but for anything larger the nested distributed parallel computing is a win.

You can find pathos (and pyina -- the mpi4py portion of pathos) here: https://github.com/uqfoundation

Upvotes: 1

Related Questions