Suresh_Hadoop
Suresh_Hadoop

Reputation: 147

Get Specific data from MapReduce

I have the following File as input which consists of 10000 lines as follows

250788965731,20090906,200937,200909,621,SUNDAY,WEEKEND,ON-NET,MORNING,OUTGOING,VOICE,25078,PAY_AS_YOU_GO_PER_SECOND_PSB,SUCCESSFUL-RELEASEDBYSERVICE,5,0,1,6.25,635-10-104-40163. 

I had to print the first column if the 18th column is lesser than 10 and the 9th column is morning. I did the following code. i'm not getting the output. The output file is empty.

public static class MyMap extends Mapper<LongWritable, Text, Text, DoubleWritable> {


    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        String[] day=line.split(",");
        double day1=Double.parseDouble(day[17]);
        if(day[8]=="MORNING" && day1<10.0)
        {
        context.write(new Text(day[0]),new DoubleWritable(day1));
        }
    }
}
public static class MyReduce extends Reducer<Text, DoubleWritable, Text,DoubleWritable> {

    public void reduce(Text key, Iterator<DoubleWritable> values, Context context) 
      throws IOException, InterruptedException {

        String no=values.toString();
        double no1=Double.parseDouble(no);
        if(no1>10.0)
        {
        context.write(key,new DoubleWritable(no1) );
        }

    }
}

Please tell what I did wrong? Is the flow correct?

Upvotes: 3

Views: 2000

Answers (2)

Pramod Solanky
Pramod Solanky

Reputation: 1700

  1. I believe this is a mapper only job as your data already has the values you want to check.
  2. Your mapper has emitted values with day1 < 10.0 while your reducer emits only value ie. day1 > 10.0 hence none of the values would be outputted by your reducers.

So I think your reducer should look like this:

String no=values.toString();
double no1=Double.parseDouble(no);
if(no1 < 10.0)
{
context.write(key,new DoubleWritable(no1) );
}

I think that should get your desired output.

Upvotes: 0

Charles Menguy
Charles Menguy

Reputation: 41428

I can see a few problems.

First, in your Mapper, you should use .equals() instead of == when comparing Strings. Otherwise you're just comparing references, and the comparison will fail even if the String objects content is the same. There is a possibility that it might succeed because of Java String interning, but I would avoid relying too much on that if that was the original intent.

In your Reducer, I am not sure what you want to achieve, but there are a few wrong things that I can spot anyway. The input key is an Iterable<DoubleWritable>, so you should iterate over it and apply whatever condition you need on each individual value. Here is how I would rewrite your Reducer:

public static class MyReduce extends Reducer<Text, DoubleWritable, Text,DoubleWritable> {

    public void reduce(Text key, Iterator<DoubleWritable> values, Context context) 
      throws IOException, InterruptedException {

        for (DoubleWritable val : values) {
             if (val.get() > 10.0) {
                 context.write(key, val);
             }
        }
    }
}

But the overall logic doesn't make much sense. If all you want to do is print the first column when the 18th column is less than 10 and the 9th column is MORNING, then you could use a NullWritable as output key of your mapper, and write column 1 day[0] as your output value. You probably don't even need Reducer in this case, which you could tell Hadoop with job.setNumReduceTasks(0);.

One thing that got me thinking, if your input is only 10k lines, do you really need a Hadoop job for this? It seems to me a simple shell script (for example with awk) would be enough for this small dataset.

Hope that helps !

Upvotes: 3

Related Questions