Creating composite key class for Secondary Sort

I am trying to create a composite key class of a String uniqueCarrier and int month for Secondary Sort. Can anyone tell me, what are the steps for the same.

Upvotes: 0

Answers (2)

Matt Fortier

Reputation: 1223

Your compareTo() implementation is incorrect. You need to sort first on uniqueCarrier, then on month to break equality:

@Override
public int compareTo(CompositeKey other) {
    if (this.getUniqueCarrier().equals(other.getUniqueCarrier())) {
        return this.getMonth().compareTo(other.getMonth());
    } else {
        return this.getUniqueCarrier().compareTo(other.getUniqueCarrier());
    }
}

One suggestion though: I typically choose to implement my attributes directly as Writable types if possible (for example, IntWriteable month and Text uniqueCarrier). This allows me to call write and readFields directly on them, and also use their compareTo. Less code to write is always good...

Speaking of less code, you don't have to call the parent constructor for your composite key.

Now for what is left to be done:

My guess is you are still missing a hashCode() method, which should only return the hash of the attribute you want to group on, in this case uniqueCarrier. This method is called by the default Hadoop partitionner to distribute work across reducers.

I would also write custom GroupingComparator and SortingComparator to make sure grouping happens only on uniqueCarrier, and that sorting behaves according to CompositeKey compareTo():

public class CompositeGroupingComparator extends WritableComparator {
    public CompositeGroupingComparator() {
        super(CompositeKey.class, true);
    }

    @Override
    public int compare(WritableComparable a, WritableComparable b) {
        CompositeKey first = (CompositeKey) a;
        CompositeKey second = (CompositeKey) b;

        return first.getUniqueCarrier().compareTo(second.getUniqueCarrier());
    }
}

public class CompositeSortingComparator extends WritableComparator {
    public CompositeSortingComparator()
    {
        super (CompositeKey.class, true);
    }

    @Override
    public int compare (WritableComparable a, WritableComparable b){
        CompositeKey first = (CompositeKey) a;
        CompositeKey second = (CompositeKey) b;

        return first.compareTo(second);
    }
}

Then, tell your Driver to use those two:

job.setSortComparatorClass(CompositeSortingComparator.class);
job.setGroupingComparatorClass(CompositeGroupingComparator.class);

Edit: Also see Pradeep's suggestion of implementing RawComparator to prevent having to unmarshall to an Object each time, if you want to optimize further.

Upvotes: 0

Pradeep Gollakota

Reputation: 2181

Looks like you have an equality problem since you're not using uniqueCarrier in your compareTo method. You need to use uniqueCarrier in your compareTo and equals methods (also define an equals method). From the java lang reference

The natural ordering for a class C is said to be consistent with equals if and only if e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2) for every e1 and e2 of class C. Note that null is not an instance of any class, and e.compareTo(null) should throw a NullPointerException even though e.equals(null) returns false.

You can also implement a RawComparator so that you can compare them without deserializing for some faster performance.

However, I recommend (as I always do) to not write things like Secondary Sort yourself. These have been implemented (as well as dozens of other optimizations) in projects like Pig and Hive. E.g. if you were using Hive, all you need to write is:

SELECT ...
FROM my_table
ORDER BY month, carrier;

The above is a lot simpler to write than trying to figure out how to write Secondary Sorts (and eventually when you need to use it again, how to do it in a generic fashion). MapReduce should be considered a low level programming paradigm and should only be used (IMHO) when you need high performance optimizations that you don't get from higher level projects like Pig or Hive.

EDIT: Forgot to mention about Grouping comparators, see Matt's answer

Upvotes: 0

Creating composite key class for Secondary Sort

Answers (2)

Related Questions