hadoop mapreduce: where's the final hdfs result file when I speficify multiple reducers?

Question

I've got a wordCount.java program and modifies it to support multiple mappers and reducers like below:

public class WordCount extends Configured implements Tool {
    public int run(String[] args) throws Exception {
        JobConf conf = new JobConf(getConf(), w1_args.class);
        for (int i = 0; i < args.length; ++i) {
            if ("-m".equals(args[i])) {
                conf.setNumMapTasks(Integer.parseInt(args[++i]));
            } else if ("-r".equals(args[i])) {
                conf.setNumReduceTasks(Integer.parseInt(args[++i]));
            } else {
                //
            }
            ...

Then I compile and run it:

hadoop jar WordCount-1.0-SNAPSHOT.jar WordCount -m 3 -r 15 input output

It runs well, and when I check the output directory:

$ hdfs dfs -ls output-18
Found 16 items
output-18/_SUCCESS
output-18/part-00000
output-18/part-00001
output-18/part-00002
output-18/part-00003
output-18/part-00004
output-18/part-00005
output-18/part-00006
output-18/part-00007
output-18/part-00008
output-18/part-00009
output-18/part-00010
output-18/part-00011
output-18/part-00012
output-18/part-00013
output-18/part-00014

OK， 15 reducers should produce 15 part-xxxx, as I expected. But where's the final result that merges all these reduces result(15 split files to 1 file)? I don't see it in hdfs directory here. I should get my file word count file, instead of 15 files, right?

Saravanan Elumalai · Accepted Answer

MapReduce will not merge the output files of reducer into a single file. We can use the following command to merge the files into a local machine or run another mapreduce job to merge

hadoop fs -getmerge /hdfs/output/dir/ /single/output/file.txt

hadoop mapreduce: where's the final hdfs result file when I speficify multiple reducers?

Answers (1)

Related Questions

hadoop mapreduce: where&#39;s the final hdfs result file when I speficify multiple reducers?

Answers (1)

Related Questions

hadoop mapreduce: where's the final hdfs result file when I speficify multiple reducers?