Reputation: 19918
Specifically, I am trying to fnd a way to compute the shortest path in a graph using map reduce. The one that I have come up with seems to require multiple rounds of map reduce. However, all the documentation I have read so far on Hadoop does not seem to clearly describe running map reduce jobs that have multiple stages.i.e. take the output from the reducer from the first stage and feed it as input to the mapper of the next stage. I am hoping something like his wold be allowed in Hadoop.
Upvotes: 3
Views: 8696
Reputation: 2294
Steve's answer is essentially correct. For each step, you'll set the inputformat's input directory to the previous step's output directory. Repeat this for as many iterations as you require. However, Map-Reduce is a poor abstraction for iterative graph problems. Take a look at Apache Giraph, which is a designed particularly for these types of situations. You'll find your problem easier to express and the iterative nature of the problem is taken care of for you.
Upvotes: 0
Reputation: 2199
I think you can refer this example : http://famousphil.com/blog/2011/06/a-hadoop-mapreduce-solution-to-dijkstra%E2%80%99s-algorithm/
Upvotes: 0
Reputation: 20969
I have blogged about it here:
http://codingwiththomas.blogspot.com/2011/04/controlling-hadoop-job-recursion.html
This is even graph-algorithm related, you will end up with quite the same code.
The basic idea is that you have a counter which is going to be a measure of how many vertices have been update in a single mapreduce step. Then you schedule jobs again and again until you have no updated vertices anymore.
But seriously, MapReduce sucks for graph algorithms, use a better framework like Apache Hama for it.
Apache Giraph can be helpful for you as well.
Upvotes: 2