Reputation: 1905

Variable/looping sequence of jobs

I'm considering using hadoop/mapreduce to tackle a project and haven't quite figured out how to set up a job flow consisting of a variable number of levels that should be processed in sequence.

E.g.:

Job 1: Map source data into X levels.
Job 2: MapReduce Level1 -> appends to Level2
Job 3: MapReduce Level2 -> appends to LevelN
Job N: MapReduce LevelN -> appends to LevelN+1

And so on until the final level. The key is that each level must include its own specific source data as well as the results of the previous level.

I've looked at pig, hive, hamake, and cascading, but have yet to see clear support for something like this.

Does anyone know an efficient way of accomplishing this? Right now I'm leaning towards writing a wrapper for hamake that will generate the hamake file based on parameters (the number of levels is known at runtime but could change with each run).

Thanks!

Upvotes: 1