Get Specific data from MapReduce

Question

I have the following File as input which consists of 10000 lines as follows

250788965731,20090906,200937,200909,621,SUNDAY,WEEKEND,ON-NET,MORNING,OUTGOING,VOICE,25078,PAY_AS_YOU_GO_PER_SECOND_PSB,SUCCESSFUL-RELEASEDBYSERVICE,5,0,1,6.25,635-10-104-40163.

I had to print the first column if the 18th column is lesser than 10 and the 9th column is morning. I did the following code. i'm not getting the output. The output file is empty.

public static class MyMap extends Mapper {


    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        String[] day=line.split(",");
        double day1=Double.parseDouble(day[17]);
        if(day[8]=="MORNING" && day1<10.0)
        {
        context.write(new Text(day[0]),new DoubleWritable(day1));
        }
    }
}
public static class MyReduce extends Reducer {

    public void reduce(Text key, Iterator values, Context context) 
      throws IOException, InterruptedException {

        String no=values.toString();
        double no1=Double.parseDouble(no);
        if(no1>10.0)
        {
        context.write(key,new DoubleWritable(no1) );
        }

    }
}

Please tell what I did wrong? Is the flow correct?

Charles Menguy · Accepted Answer

I can see a few problems.

First, in your Mapper, you should use .equals() instead of == when comparing Strings. Otherwise you're just comparing references, and the comparison will fail even if the String objects content is the same. There is a possibility that it might succeed because of Java String interning, but I would avoid relying too much on that if that was the original intent.

In your Reducer, I am not sure what you want to achieve, but there are a few wrong things that I can spot anyway. The input key is an Iterable, so you should iterate over it and apply whatever condition you need on each individual value. Here is how I would rewrite your Reducer:

public static class MyReduce extends Reducer {

    public void reduce(Text key, Iterator values, Context context) 
      throws IOException, InterruptedException {

        for (DoubleWritable val : values) {
             if (val.get() > 10.0) {
                 context.write(key, val);
             }
        }
    }
}

But the overall logic doesn't make much sense. If all you want to do is print the first column when the 18th column is less than 10 and the 9th column is MORNING, then you could use a NullWritable as output key of your mapper, and write column 1 day[0] as your output value. You probably don't even need Reducer in this case, which you could tell Hadoop with job.setNumReduceTasks(0);.

One thing that got me thinking, if your input is only 10k lines, do you really need a Hadoop job for this? It seems to me a simple shell script (for example with awk) would be enough for this small dataset.

Hope that helps !

Get Specific data from MapReduce

Answers (2)

Related Questions