Python MapReduce on Sun Grid Engine

Question

I am relatively new to distributed computing, so forgive me if I misunderstand some of the basic concepts here. I am looking for a (preferably) Python-based alternative to Hadoop for processing large data sets via MapReduce on a cluster using an SGE-based grid engine (eg. OpenGrid or Sun of Grid Engine). I have had good luck running basic distributed jobs with PythonGrid, but I'd really like a more feature-rich framework for running my jobs. I have read up on tools like Disco and MinceMeatPy, both of which seem to offer true Map-Sort-Reduce job processing, but their does not seem to be any obvious support for SGE. This makes me wonder if it is possible to achieve true MapReduce functionality using a grid scheduler, or if people just don't support it out-of-the-box because they are not frequently used. Can you perform Map-Sort-Reduce tasks on a grid engine? Are their Python tools that support this? How difficult would it be to rig existing MapReduce tools to use SGE job schedulers?

Python MapReduce on Sun Grid Engine

Answers (1)

Related Questions