user2201536
user2201536

Reputation: 109

Custom Partitioner : N number of keys to N different files

My requirement is to write a custom partitioner. I have these N number of keys coming from mapper for example('jsa','msa','jbac'). Length is not fixed. It can be anyword infact. My requirement is to write a custom partitioner in such a way that It will collect all the same key data in to same file. Number of keys is not fixed. Thank you in Advance.

Thanks, Sathish.

Upvotes: 1

Views: 4931

Answers (2)

tylerhawkes
tylerhawkes

Reputation: 81

I imagine the best way to do this since it will give a more even split would be:

    public class CustomPartitioner<Text, Text> extends Partitioner<K, V> 
    {
       public int getPartition(Text key, Text value,int numReduceTasks)
       {
           return key.hashCode() % numReduceTasks;
       }
    } 

Upvotes: 0

user1188611
user1188611

Reputation: 955

So you have multiple keys which mapper is outputting and you want different reducers for each key and have a separate file for each key.

So first thing writing Partitioner can be a way to achieve that. By default hadoop has its own internal logic that it performs on keys and depending on that it calls reducers. So if you want to write a custom partitioner than you have to overwrite that default behaviour by your own logic/algorithm. Unless you know how exactly your keys will vary this logic wont be generic and based on variations you have to figure out the logic.

I am providing you a sample example here you can refer that but its not generic.

public class CustomPartitioner extends Partitioner<Text, Text>
 {

     @Override
     public int getPartition(Text key, Text value, int numReduceTasks) 
         {

               if(key.toString().contains("Key1"))
               {
                   return 1;
               }else if(key.toString().contains("Key2"))
               {
                   return 2;
               }else if(key.toString().contains("Key3"))
               {
                   return 3;
               }else if(key.toString().contains("Key4"))
               {
                   return 4;
               }else if(key.toString().contains("Key5"))
               { 
                   return 5;
               }else
               {
                   return 7;
               } 
         }
 }

This should solve your problem. Just replace key1,key2 ..etc by your key name...

In case you don't know the key names you can write your own logic by referring following:

public class CustomPartitioner<Text, Text> extends Partitioner<K, V> 
{
   public int getPartition(Text key, Text value,int numReduceTasks)
   {
       return (key.toString().charAt(0)) % numReduceTasks;
   }
} 

In above partitioner just to illustrate that how you can write your own logic I have shown that if you take out length of the keys and do % operation with number of reducers than you will get one unique number which will be between 0 to Number of Reducers so by default different reducers get called and gives output in different files. But in this approach you have to make sure that for two keys same value should not be written

This was about Customized partitioner.

Another solution can be you can override the MultipleOutputFormat class methods that will enable to do the job in a generic way. Also using this approach you will be able to generate customized file name for reducer output files in hdfs.

NOTE: Make sure you use same libraries. Don't mix mapred against mapreduce libraries. org.apache.hadoop.mapred are older libraries and org.apache.hadoop.mapreduce are new ones.

Hope this will help.

Upvotes: 2

Related Questions