Reputation: 19918

Iterative map reduce jobs. How to take reducer output and feed it to the next stage?

Specifically, I am trying to fnd a way to compute the shortest path in a graph using map reduce. The one that I have come up with seems to require multiple rounds of map reduce. However, all the documentation I have read so far on Hadoop does not seem to clearly describe running map reduce jobs that have multiple stages.i.e. take the output from the reducer from the first stage and feed it as input to the mapper of the next stage. I am hoping something like his wold be allowed in Hadoop.

Upvotes: 3

Answers (3)

Jakob Homan

Reputation: 2294

Steve's answer is essentially correct. For each step, you'll set the inputformat's input directory to the previous step's output directory. Repeat this for as many iterations as you require. However, Map-Reduce is a poor abstraction for iterative graph problems. Take a look at Apache Giraph, which is a designed particularly for these types of situations. You'll find your problem easier to express and the iterative nature of the problem is taken care of for you.

Upvotes: 0

Anuj Kulkarni

Reputation: 2199

I think you can refer this example : http://famousphil.com/blog/2011/06/a-hadoop-mapreduce-solution-to-dijkstra%E2%80%99s-algorithm/

Upvotes: 0

Thomas Jungblut

Reputation: 20969

I have blogged about it here:

http://codingwiththomas.blogspot.com/2011/04/controlling-hadoop-job-recursion.html

This is even graph-algorithm related, you will end up with quite the same code.

The basic idea is that you have a counter which is going to be a measure of how many vertices have been update in a single mapreduce step. Then you schedule jobs again and again until you have no updated vertices anymore.

But seriously, MapReduce sucks for graph algorithms, use a better framework like Apache Hama for it.

Apache Giraph can be helpful for you as well.

Upvotes: 2

Iterative map reduce jobs. How to take reducer output and feed it to the next stage?

Answers (3)

Related Questions