Reputation: 2077
I have a few problems which may apply well to the Map-Reduce model. I'd like to experiment with implementing them, but at this stage I don't want to go to the trouble of installing a heavyweight system like Hadoop or Disco.
Is there a lightweight Python framework for map-reduce which uses the regular filesystem for input, temporary files, and output?
Upvotes: 11
Views: 16779
Reputation: 355
MockMR - https://github.com/sjtrny/mockmr
It's meant for educational use. Does not currently operate in parallel but accepts standard Python objects as IO.
Upvotes: 1
Reputation: 277
So this was asked ages ago, but I worked on a full implementation of mapreduce over the weekend: remap.
https://github.com/gtoonstra/remap
Pretty easy to install with minimal dependencies, if all goes well you should be able to run a test run in 5 minutes.
The entire processing pipeline works, but submitting and monitoring jobs is still being worked on.
Upvotes: 0
Reputation: 8014
Check out Apache Spark. It is written in Java but it has also a Python API. You can try it locally on your machine and then, when you need it, you can easily distribute your computation over a cluster.
Upvotes: 1
Reputation: 31
http://jsmapreduce.com/ -- in-browser mapreduce; in Python or Javascript; nothing to install
Upvotes: 3
Reputation: 533
http://pythonhosted.org/mrjob/ is great to quickly get started on your local machine, basically all you need is a simple:
pip install mrjob
Upvotes: 6
Reputation: 150
A Coursera course dedicated to big data suggests using these lightweight python Map-Reduce frameworks:
To get you started very quickly, try this example:
https://github.com/michaelfairley/mincemeatpy/zipball/v0.1.2
(hint: for [server address] in this example use localhost)
Upvotes: 11