user629034
user629034

Reputation: 669

Hadoop and MapReduce

I am new to HDFS and MapReduce and trying to calculate survey statistics. Input file is in this format: Age Points Sex Category - all 4 of them are numbers. Is this the correct start:

    public static class MapClass extends MapReduceBase
    implements Mapper<IntWritable, IntWritable, IntWritable, IntWritable> {
    private final static IntWritable Age = new IntWritable(1) ;
    private IntWritable AgeCount = new IntWritable() ;

    public void map( Text key, Text value,
                    OutputCollector<IntWritable, IntWritable> output,
                    Reporter reporter) throws IOException {
        AgeCount. set(Integer. parseInt(value. toString() ) ) ;
        output. collect(AgeCount, Age) ;
    }
}

My questions: 1. Is this a correct start? 2. If I want to collect for other attributes like Sex,Points - will I just add another output.collect statements? I know I have to read the line and split into attributes. 3. Where it says implements Mapper - I made all 4 IntWritable is it correct?

Upvotes: 3

Views: 1985

Answers (1)

diliop
diliop

Reputation: 9451

The Mapper interface expects 4 type parameters in the following order: Map input key, Map input value, Map output key and Map output value. In your case, since you are dealing with 4 integers of which 3 constitute your value and 1 your key, you are wrong to be using IntWritable as your Map input key and should be using Text instead. Also, the types you specify in your MapClass definition do not match the types you pass to your Map function. Given that you are dealing with text files, your MapClass should be defined as follows:

public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, IntWritable, IntWritable>

In essence, you are assuming one input line of text file per map call which you will be parsing into the fields you want and casting them to ints within the map function. So, your map function would then have the following definition:

public void map(LongWritable key, Text value, OutputCollector<IntWritable, IntWritable> output, Reporter reporter) throws IOException {...}

Upvotes: 4

Related Questions