Nader Hisham
Nader Hisham

Reputation: 5414

Not able to get results I want From MapReduce job

This is a sample from my data

enter image description here

if the first column is of index 0 , I want to get the total sales per store from this file using MapReduce , Store name is at index 2 and revenue is at index 4

This is my Mapper Code

public void map(LongWritable key , Text value , Context context)
throws IOException , InterruptedException
{
    String line = value.toString();
    String[] columns = line.split("\t");

    if(columns.length == 6)
    {
        String storeNameString = columns[2];
        Text storeName = new Text(storeNameString);

        String storeRevenueString = columns[4];
        IntWritable storeRevenue = new IntWritable(Integer.parseInt(storeRevenueString));
        context.write(storeName, storeRevenue);
    }   
}

This is My Reducer Code

public void reduce(Text key, Iterable<IntWritable> values, Context context)
        throws IOException , InterruptedException {

    Text storeName = key;
    int storeSales = 0;

    while(values.iterator().hasNext())
    {
        storeSales += values.iterator().next().get();

    }
    context.write(storeName, new IntWritable(storeSales));
}

this is the code that runs the job

public class StoreSales extends Configured implements Tool {

public static void main(String[] args) throws Exception {
    // this main function will call run method defined above.
    int res = ToolRunner.run(new StoreSales(),args);
    System.exit(res);
}

@Override
public int run(String[] args) throws Exception {
    // TODO Auto-generated method stub
    JobConf conf = new JobConf();

    @SuppressWarnings("unused")
    Job job = new Job(conf , "Sales Per Store");

    job.setMapperClass(StoreSalesMapper.class);
    job.setReducerClass(StoreSalesReducer.class);
    job.setJarByClass(StoreSales.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    Path input = new Path(args[0]);
    Path output = new Path(args[1]);

    FileInputFormat.addInputPath(conf , input);
    FileOutputFormat.setOutputPath(conf, output);

    JobClient.runJob(conf);

    return 0;
    }
 }

This is a sample of how results should be enter image description here

this is the result I get enter image description here

What am I doing wrong ?

Upvotes: 0

Views: 243

Answers (2)

Davis Broda
Davis Broda

Reputation: 4125

I believe that I have located the problem here. when using the line.split method you have improperly escaped the tab character. This is because the String.split method interprets its input as a regex. When using regex, the correct way to specify a tab character is \\t, while you are using the \t. This is due to the fact that the backslash itself must be escaped. Note that you are missing a \ character.

corrected split condition

String[] columns = line.split("\\t");

Upvotes: 0

user3484461
user3484461

Reputation: 1133

There is nothing wrong in your logic , i have used your logic and modified bit in driver program using new map reduce api :

Mapper part

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Map extends Mapper<LongWritable,Text,Text,IntWritable>{


    public void map(LongWritable key , Text value , Context context)
            throws IOException , InterruptedException
            {
                String line = value.toString();
                String[] columns = line.split("\\t");

                if(columns.length == 6)
                {
                    String storeNameString = columns[2];
                    Text storeName = new Text(storeNameString);

                    String storeRevenueString = columns[4];
                    IntWritable storeRevenue = new IntWritable(Integer.parseInt(storeRevenueString));
                    context.write(storeName, storeRevenue);
                }   
            }
}

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Reduce extends Reducer<Text,IntWritable,Text,IntWritable>{

    public void reduce(Text key, Iterable<IntWritable> values, Context context)
            throws IOException , InterruptedException {

        Text storeName = key;
        int storeSales = 0;

        while(values.iterator().hasNext())
        {
            storeSales += values.iterator().next().get();

        }
        context.write(storeName, new IntWritable(storeSales));
    }

}


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Driver {

public static void main(String[] args) throws Exception {
    // this main function will call run method defined above.

    // TODO Auto-generated method stub
    Configuration conf=new Configuration();
    @SuppressWarnings("unused")
    Job job = new Job(conf , "Sales Per Store");

    job.setMapperClass(Map.class);
    job.setReducerClass(Reduce.class);
    job.setJarByClass(Driver.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    job.waitForCompletion(true);


    }
 }

Sample Input file :

2012-01-01 09.00 sanJose clothin 214 amex

2012-01-01 09.00 seattle music 320 master

2012-01-01 09.00 seattle elec 3120 master

2012-01-01 09.00 sanJose perfume 3200 amex

Output File :

cat test123/part-r-00000

sanJose 3414

seattle 3440

Upvotes: 1

Related Questions