wj1091
wj1091

Reputation: 159

How to output files with a specific extension (like .csv) in Hadoop, using MultipleOutputs class

I currently have a MapReduce program that uses MultipleOutputsto output the result into multiple files. The reducer looks like this:

private MultipleOutputs mo = new MultipleOutputs<NullWritable, Text>(context);
...
public void reduce(Edge keys, Iterable<NullWritable> values, Context context)
            throws IOException, InterruptedException {
        String date = records.formatDate(millis);
        out.set(keys.get(0) + "\t" + keys.get(1));
        parser.parse(key); 
        String filePath = String.format("%s/part", parser.getFileID());
        mo.write(noval, out, filePath);
    }

This is very similar to the example in the book Hadoop: The Definitive Guide - however, the problem is that it outputs the files as plain text. I want my files to be outputted as .csv files and haven't managed to find an explanation on it in the book or online.

How can this be done?

Upvotes: 1

Views: 610

Answers (1)

Turbero
Turbero

Reputation: 126

Have you tried to iterate through your output folder after the completion of the Job object in your driver to rename the files?

As long as you emit in your reducer (the text should be the line in the csv with the values separated by semicolon or whatever you need) you can give a try to something like this:

Job job = new Job(getConf());
//...
//your job setup, including the output config 
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);
//...
boolean success = job.waitForCompletion(true);
if (success){
    FileSystem hdfs = FileSystem.get(getConf());
    FileStatus fs[] = hdfs.listStatus(new Path(outputPath));
    if (fs != null){ 
        for (FileStatus aFile : fs) {
            if (!aFile.isDir()) {
                hdfs.rename(aFile.getPath(), new Path(aFile.getPath().toString()+".csv"));
            }
        }
    }
}

Upvotes: 2

Related Questions