Reputation: 11
In Hadoop Secondary sort the code in Composite has the following method to compare values, the Composite key class implements WritableComparable
:-
@Override
public int compareTo(CustomKey o) {
int result = firstName.compareTo(o.getFirstName());
log.debug("value is " + result);
if (result == 0) {
return lastName.compareTo(o.getLastName());
}
return result;
}
In the custom sorter that we create to perform secondary sort which extends WritableComparator
and the code goes like this :-
@Override
public int compare(WritableComparable w1, WritableComparable w2) {
CustomKey key1 = (CustomKey) w1;
CustomKey key2 = (CustomKey) w2;
int value = key1.getFirstName().compareTo(key2.getFirstName());
if (value == 0) {
return -key1.getLastName().compareTo(key2.getLastName());
}
return value;
}
I want to know why we are comparing values twice for sorting once in CustomKey
class by implementing WritableComparable
and then we create one CustomSorter
class again to sort the value by extending WritableComparator
.
Upvotes: 0
Views: 782
Reputation: 1
ur custom sorter method will be needed only under 2 conditions : 1) the sorting process in CustomSorter class is different from that in compareTo method in your CompositeKey class 2) you want to give preference to CustomSorter class' sorting logic. If the above conditions are not met, your CompositeKey class will suffice for sorting.
Upvotes: 0
Reputation: 129
I am not sure where the code you have referred is taken from.
I will try to answer it in generic way.
Here is the extract from the Hadoop Definitive Guide for Secondary Sorting,
Grouping similar keys will be very efficient when they are sorted. Grouping comparator is meant for this, it helps in efficiently identifying the chunks of keys that are similar.
Ex: Assume that you get following keys (composite) out from your mapper.
A,1
B,2
A,2
B,3
Grouping comparator will work on these and sort them like below,
A,1
A,2
B,2
B,3
For you to get secondary sorting to work, you need to then sort on the value part. Thats what is being achieved by the SortingComparator.
Final output would be, (Provided you have a partitioner, that partitions on the key part in the composite key)
A,2
A,1
B,3
B,2
Upvotes: 0