NamoBhagavan
NamoBhagavan

Reputation: 35

Unexpected output from Hadoop word count

I modified the code below to output words which occurred at least ten times. But it does not work -- the output file does not change at all. What do I have to do to make it work?

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.*;
// ...
public class WordCount extends Configured implements Tool {
// ...
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            word.set(tokenizer.nextToken());
            context.write(word, one);
        }
    }
}

public static class Reduce extends
        Reducer<Text, IntWritable, Text, IntWritable> {
    public void reduce(Text key, Iterable<IntWritable> values,
            Context context) throws IOException, InterruptedException {

        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }
                    // where I modified, but not working, the output file didnt change
        if(sum >= 10)
        {
            context.write(key, new IntWritable(sum));
        }
    }
}

public int run(String[] args) throws Exception {
    Job job = new Job(getConf());
    job.setJarByClass(WordCount.class);
    job.setJobName("wordcount");

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    job.setMapperClass(Map.class);
    //job.setCombinerClass(Reduce.class);
    job.setReducerClass(Reduce.class);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.setInputPaths(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    boolean success = job.waitForCompletion(true);
    return success ? 0 : 1;
}

public static void main(String[] args) throws Exception {
    int ret = ToolRunner.run(new WordCount(), args);
    System.exit(ret);
}
}

Upvotes: 1

Views: 290

Answers (4)

Chaos
Chaos

Reputation: 11721

The code is definitely correct, Maybe you are reading the output generated before you modified the code. Or maybe you did not update the jar file which you previously used after modifying the code?

Upvotes: 0

Niels Basjes
Niels Basjes

Reputation: 10652

The code looks valid. To be able to help you we need at least the command line you used to run this. It would also help if you could post the actual output if you feed it a file like this

one
two two
three three three

Etc up till 20

Upvotes: 0

RFT
RFT

Reputation: 1071

You can see the default Hadoop counters and have an idea of whats happening.

Upvotes: 0

David Gruzman
David Gruzman

Reputation: 8088

Code looks completely valid. I can suspect that your dataset is big enough, so words happens to appear more then 10 times? Please laso make sure that you indeed looking into new results..

Upvotes: 1

Related Questions