Ophir Yoktan
Ophir Yoktan

Reputation: 8449

How to schedule post processing task after a mapreduce job

I'm looking for a simple method to chain post processing code after a map reduce job

specifically, in involves renaming\moving the out files create by org.apache.hadoop.mapred.lib.MultipleOutputs (the class has limitations on the output file names, so I ca't produce the files directly in the mapreduce job)

The options I know (or think of) are:

Upvotes: 0

Views: 537

Answers (1)

Evgeny Benediktov
Evgeny Benediktov

Reputation: 1399

Your "simple" task should be a Mapper-only job. Your Map() receives as key the file name and renames the file. For this you have to write your own InputFormat and RecordReader, like in the links, but your RecordReader should not actually read the file, just return the file name in getCurrentKey():

https://code.google.com/p/hadoop-course/source/browse/HadoopSamples/src/main/java/mr/wholeFile/WholeFileInputFormat.java?r=3

https://code.google.com/p/hadoop-course/source/browse/HadoopSamples/src/main/java/mr/wholeFile/WholeFileRecordReader.java?r=3

Upvotes: 1

Related Questions