heisenberg
heisenberg

Reputation: 73

Grouping a range of values in Map Reduce in Java Hadoop 2.2

I have the below input data in the form of JSON format.

        "SeasonTicket": false, 
        "name": "Vinson Foreman", 
        "gender": "male", 
        "age": 50, 
        "email": "[email protected]", 
        "annualSalary": "$98501.00", 
        "id": 0

I need to sort the values based on the salary range ie 1000-10000,10000-25000 and so on.

Range        Count 
1000-10000     10
10000-50000    20

I am not using the default JSON parser or Jackson for processing the data but am parsing it as a String.I have the below map and reduce functions.

Map function

public class DemoMapper extends MapReduceBase
        implements Mapper<LongWritable, Text, Text, IntWritable> {

    private final IntWritable v = new IntWritable(1);
    private Text k = new Text();

    @Override
    public void map(LongWritable key, Text value,
                    OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        try {
            if (line.contains("annualSalary")) {
                String s = line.replaceAll("$", "");
                String t = s.substring(26);
                Double x = Double.parseDouble(t);

                StringTokenizer itr = new StringTokenizer(t);
                while (itr.hasMoreTokens()) {
                    Double x = Double.parseDouble(s.substring(26));
                    if (x > 1000 && x < 10000) {
                        k.set(itr.nextToken());
                        output.collect(k, v);
                    } else if (x > 10000 && x < 50000) {
                        k.set(itr.nextToken());
                        output.collect(k, v);
                    } else if (x > 50000 && x < 100000) {
                        k.set(itr.nextToken());
                        output.collect(k, v);
                    } else if (x > 100000) {
                        k.set(itr.nextToken());
                        output.collect(k, v);
                    }
                    output.collect(k, v);
                }

            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }

    }
}

Reduce function

public class DemoReducer extends MapReduceBase
        implements Reducer<Text, IntWritable, Text, IntWritable> {
    private IntWritable count = new IntWritable();

    @Override
    public void reduce(Text key, Iterator<IntWritable> values,
                       OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        int sum = 0;
        while (values.hasNext()) {
            IntWritable value = (IntWritable) values.next();
            sum += value.get();
        }
        count.set(sum);
        output.collect(key, (IntWritable) count);
    }
}

Please let me know to group this data without using a JSON parser if possible.

Upvotes: 0

Views: 1085

Answers (1)

TheCowGoesMoo
TheCowGoesMoo

Reputation: 176

To group the data by ranges you can use a custom partitioner - example.

Upvotes: 1

Related Questions