Reputation: 28022
I understood the way of sorting the values of a particular key before the key enters the reducer. I learned that it can be done by writing three methods viz, keycomparator, partitioner and valuegrouping.
Now, when valuegrouping runs, it basically groups all the values associated with the natural key, right? So when it groups all the values for the natural key, what will be the actual key that is sent along with a set of sorted values to the reducer? The natural key would have been associated with more than one type of entity (the second part of the composite key). What will be the composite key sent to the reducer?
ap
Upvotes: 2
Views: 859
Reputation: 30089
This may be surprising to know, but each iteration of the values Iterable actually updates the key reference too:
protected void reduce(K key, Iterable<V> values, Context context) {
for (V value : values) {
// key object contents will update for each iteration of this loop
}
}
I know this works for the new mapreduce API, i haven't traced it for the old mapred API.
So in answer to your question, all the keys will be available, the first key will relate to the first sorted key of the group.
EDIT: Some additional information as to how and why this works:
There are two comparators that the reducer uses to process the key/value pairs output by the map stage:
Under the hood, the reference to the key and value never changes, each call to Iterable.Iterator.next() advances the pointer in the underlying byte stream to the next KV pair. If the key grouper determines that the current set of keys bytes and previous set are comparatively the same key, then the hasNext method of the value Iterable.iterator() will return true, otherwise false. If true is returned, the bytes are deserialized into the Key and Value instances for consumption in your reduce method.
Upvotes: 4