florian
florian

Reputation: 368

How do I create a new, unique key in a Hadoop Reducer

In a Hadoop Reducer, I would like to create and emit new keys under specific conditions, and I'd like to ensure that these keys are unique.

The pseudo-code for what I want goes like:

@Override
protected void reduce(WritableComparable key, Iterable<Writable> values, Context context) 
                       throws IOException, InterruptedException {
     // do stuff:
     // ...
     // write original key:
     context.write(key, data);
     // write extra key:
     if (someConditionIsMet) {
       WritableComparable extraKey = createNewKey()
       context.write(extraKey, moreData);
     }
}

So I now have two questions:

  1. Is it possible at all to emit more than one different key in reduce()? I know that keys won't be resorted but that is ok for me.
  2. The extra key has to be unique across all reducers - both for application reasons and because I think it would otherwise violate the contract of the reduce stage. What is a good way to generate a key that is unique across reducers (and possibly across jobs?)

    Maybe get reducer/job IDs and incorporate that into key generation?

Upvotes: 2

Views: 1011

Answers (1)

Chris White
Chris White

Reputation: 30089

  1. Yes you can output any number of keys
  2. You can incorporate the task attempt information into your key to make it job unique (across the reducers and even handling speculative execution if you want). you can acquire this information from the reducer's Context.getTaskAttemptID() method and then pull out the reducer ID number with TaskAttemptID.getTaskID().getId()

Upvotes: 2

Related Questions