user3837453
user3837453

Reputation: 11

Hadoop Secondary Sort Composite key compareTo vs Custom Sorter compare implementations

In Hadoop Secondary sort the code in Composite has the following method to compare values, the Composite key class implements WritableComparable :-

@Override
public int compareTo(CustomKey o) {

    int result = firstName.compareTo(o.getFirstName());     
    log.debug("value is " + result);                
    if (result == 0) {
        return lastName.compareTo(o.getLastName());
    }
    return result;
}

In the custom sorter that we create to perform secondary sort which extends WritableComparator and the code goes like this :-

@Override
public int compare(WritableComparable w1, WritableComparable w2) {
    CustomKey key1 = (CustomKey) w1;
    CustomKey key2 = (CustomKey) w2;
    int value = key1.getFirstName().compareTo(key2.getFirstName());
    if (value == 0) {           
        return -key1.getLastName().compareTo(key2.getLastName());       
    }
    return value;
}

I want to know why we are comparing values twice for sorting once in CustomKey class by implementing WritableComparable and then we create one CustomSorter class again to sort the value by extending WritableComparator.

Upvotes: 0

Views: 782

Answers (2)

Dev
Dev

Reputation: 1

ur custom sorter method will be needed only under 2 conditions : 1) the sorting process in CustomSorter class is different from that in compareTo method in your CompositeKey class 2) you want to give preference to CustomSorter class' sorting logic. If the above conditions are not met, your CompositeKey class will suffice for sorting.

Upvotes: 0

Ramu Malur
Ramu Malur

Reputation: 129

I am not sure where the code you have referred is taken from.

I will try to answer it in generic way.

Here is the extract from the Hadoop Definitive Guide for Secondary Sorting,

  1. Make the key a composite of the natural key and the natural value.
  2. The Sort comparator should order by the composite key, that is, the natural key and natural value.
  3. The Partitioner and Grouping comparator for the composite key should consider only the natural key for partitioning and grouping.

Grouping similar keys will be very efficient when they are sorted. Grouping comparator is meant for this, it helps in efficiently identifying the chunks of keys that are similar.

Ex: Assume that you get following keys (composite) out from your mapper.

A,1

B,2

A,2

B,3

Grouping comparator will work on these and sort them like below,

A,1

A,2

B,2

B,3

For you to get secondary sorting to work, you need to then sort on the value part. Thats what is being achieved by the SortingComparator.

Final output would be, (Provided you have a partitioner, that partitions on the key part in the composite key)

A,2

A,1

B,3

B,2

Upvotes: 0

Related Questions