Sara
Sara

Reputation: 11

How to sort values (with their corresponding key) in mapReduce Hadoop framework?

I am trying to sort the input data I have using Hadoop mapReduce. The problem is that I am only able to sort the key-value pairs by key, while I am trying to sort them by value. Each value's key was created with a counter, so the first value (234) has key 1, and the second value (944) has key 2, etc. Any idea on how I can do it and order the input by values?


import java.io.IOException;
import java.util.StringTokenizer;
import java.util.ArrayList;
import java.util.List;
import java.util.Collections;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Sortt {

  public static class TokenizerMapper
       extends Mapper<Object, Text, Text ,IntWritable >{
    int k=0;
    int v=0;
    int va=0;
    public Text ke = new Text();
   private final static IntWritable val = new IntWritable();

    public void map(Object key, Text value, Context context) throws 
    IOException, InterruptedException 
{
      StringTokenizer itr = new StringTokenizer(value.toString());


        while (itr.hasMoreTokens()) 
{
        val.set(Integer.parseInt(itr.nextToken()));
        v=val.get();
        k=k+1;
        ke.set(Integer.toString(k));

        context.write(ke, new IntWritable(v));}
}


    }


  public static class SortReducer
       extends Reducer<Text,IntWritable,Text,IntWritable> {
        int a=0;
        int v=0;
       private IntWritable va = new IntWritable();
    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
    List<Integer> sorted = new ArrayList<Integer>();

    for (IntWritable val : values) {
           a= val.get();
          sorted.add(a);

}
    Collections.sort(sorted);
    for(int i=0;i<sorted.size();i++) {
    v=sorted.get(i);
    va.set(v);

     context.write(key, va);
}
    }
  }

  public static void main(String[] args) throws Exception {
   long startTime=0;
   long Time=0;
   long duration=0;
Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "sort");
    job.setJarByClass(Sortt.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(SortReducer.class);
    job.setReducerClass(SortReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
    Time = System.currentTimeMillis();
  //duration = (endTime-startTime)/1000000;
    System.out.println("time="+Time+"MS");
  }
}

Input:

234

944

241

130

369

470

250

100

250

735

856

659

425

756

123

756

459

754

654

951

753

254

698

741

Expected Output:

8 100

15 123

4 130

1 234

3 241

24 241

7 250

9 250

22 254

5 369

13 425

17 459

6 470

19 654

12 659

23 698

10 735

21 753

18 754

14 756

16 756

11 856

2 944

20 951

Current Output:

1 234

10 735

11 856

12 659

13 425

14 757

15 123

16 756

17 459

18 754

19 654

2 944

20 951

21 753

22 254

23 698

24 741

3 241

4 130

5 369

6 470

7 250

8 100

9 250

Upvotes: 0

Views: 3302

Answers (1)

subodh
subodh

Reputation: 6158

MapReduce output by default sort by key, and to sort by values you can use Secondary Sort. Secondary sort is an one of the best technique to sort the reducer output on values, here is one complete example.

Upvotes: 1

Related Questions