How to schedule post processing task after a mapreduce job

Question

I'm looking for a simple method to chain post processing code after a map reduce job

specifically, in involves renaming\moving the out files create by org.apache.hadoop.mapred.lib.MultipleOutputs (the class has limitations on the output file names, so I ca't produce the files directly in the mapreduce job)

The options I know (or think of) are:

add it in the job creation code - this is what I do now, but I prefer the task will be scheduled by the jobtracker (to reduce the chances of the process being aborted)
using a workflow engine (luigi, oozie) - but this seems like an overkill for this issue
using job chaining - this allows chaining mapreduce jobs - it it possible to chain a "simple" task?

How to schedule post processing task after a mapreduce job

Answers (1)

Related Questions