Reputation: 11
I work on MapReduce with the Wordcount example: Input data:
text files
Output:
term: fileName occurrences
Map output :
Term:filename 1 1 1 1 1
Reduce output:
Term: filename occurences
Example of the code final output "reducer output":
Iphone: file1 4
Iphone: file2 3
Galaxy: file1 2
Htc: file1 3
Htc file2 5
Output I want
Iphone: file1=4 file2=3
Galaxy: file1=2
Htc: file1=3 file2=5
How can I get this case, I thought about using the partitioning function, put I don't know how to do that? Any suggestion? Thanks in advance
Upvotes: 1
Views: 307
Reputation: 1006
There are various ways to achieve the output you want but since you have mention about to do it with a partitioner let's do it with that.
According to your question you need to create a partitioner on key on basis of which you want to divide output which is "Term" (iphone, Galaxy etc) .I am assuming here that your map output key format and value format is text if not make changes accordingly. This is what your partitioner should look like
public class Partitioners extends org.apache.hadoop.mapreduce.Partitioner<Text,Text>{
// I have the written the code if there are 3 reducer(since you have 3 type of key).
//Tip: your number of reducers should be equal to the no of batches you want to divide your map output into.
@Override
public int getPartition(Text key, Text value, int numReduceTasks) {
String Skey = key.toString();
//Again make changes according to your requirement here but I think it will work according to the composite key you have mentioned
String term = Skey.substring(0, Skey.indexOf(':'));
if(term.equals("Iphone"))
{ // this will send all the key having iphone in reducer 1
return 0;
}else if(term.equals("Galaxy"))
{ // this will send all the key having Galaxy in reducer 2
return 1;
}
else{
// this will send all the key having other then Iphone and galaxy which is Htc in your case in reducer 3
return 2;
}
}
}
Now once partitioner is done we need to inform our driver class about this thus add following in your driver class
job.setPartitionerClass(Partitioners.class);
job.setNumReduceTasks(3); //since we want 3 reducers
This will divide your map output in 3 partitioner and now you can reduce the output accordingly in reducer class.
I hope this solves your problem. If not let me know.
Upvotes: 1