Mike Park
Mike Park

Reputation: 10931

Using Oozie to combine output file parts

Is it possible to use Oozie to concatenate the output of a MapReduce job into a single file? Lets say I have the output ...

part-r-00000
part-r-00001
part-r-00002

and I just want...

output.csv

I know I can pull them down as a single file with hadoop fs -getmerge, but I'm curious if it's possible with a workflow application and HDFS.

Upvotes: 3

Views: 1260

Answers (2)

Rick Moritz
Rick Moritz

Reputation: 1518

You can probably use pig or Java to call

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#concat-org.apache.hadoop.fs.Path-org.apache.hadoop.fs.Path:A-

or maybe add it to your own fork of Oozie's fs-action.

Alternatively, using webhdfs: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Concat_Files .

You could wrap that curl call in a shell or ssh action.

Upvotes: 0

Chris White
Chris White

Reputation: 30089

Two simple options i can think of:

  1. Amend the job that produced this output to use a single reducer
  2. Run a map-reduce action with identity mapper, identity reducer and single reducer

Upvotes: 2

Related Questions