How do I create a new, unique key in a Hadoop Reducer

Question

In a Hadoop Reducer, I would like to create and emit new keys under specific conditions, and I'd like to ensure that these keys are unique.

The pseudo-code for what I want goes like:

@Override
protected void reduce(WritableComparable key, Iterable values, Context context) 
                       throws IOException, InterruptedException {
     // do stuff:
     // ...
     // write original key:
     context.write(key, data);
     // write extra key:
     if (someConditionIsMet) {
       WritableComparable extraKey = createNewKey()
       context.write(extraKey, moreData);
     }
}

So I now have two questions:

Is it possible at all to emit more than one different key in reduce()? I know that keys won't be resorted but that is ok for me.
The extra key has to be unique across all reducers - both for application reasons and because I think it would otherwise violate the contract of the reduce stage. What is a good way to generate a key that is unique across reducers (and possibly across jobs?)

Maybe get reducer/job IDs and incorporate that into key generation?

Chris White · Accepted Answer

Yes you can output any number of keys
You can incorporate the task attempt information into your key to make it job unique (across the reducers and even handling speculative execution if you want). you can acquire this information from the reducer's Context.getTaskAttemptID() method and then pull out the reducer ID number with TaskAttemptID.getTaskID().getId()

How do I create a new, unique key in a Hadoop Reducer

Answers (1)

Related Questions