kapibarasama
kapibarasama

Reputation: 211

Pass small number of values from Mapper to Reducer

How can you pass a small number of metadata collected in the Mapper to the Reducer? In my specific problem, I only want to pass two long values, so I wouldn't use MultipleOutputFormat or MultipleOutputs for these.

Some variants I have tried:

(1)

Mapper

    context.getCounter("Countergroup", "Counter").increment(1);

Reducer

    counter = context.getCounter("Countergroup", "Counter").getValue(); 

Counters are not updated regularly, so the function call in the Reducer results in a 0 value.



(2)

Mapper

    context.getConfiguration().setInt("Counter", countTotal);

Reducer

    counter = context.getConfiguration().getInt("Counter", 0);          

Certainly Configurations can not be changed during a running job (was worth trying).

There have already been questions about this problem, but I could not find a working answer. Also, the API has changed. I am using Hadoop 0.20.2 .



Similar questions:

Passing values from Mapper to Reducer

Accessing a mapper's counter from a reducer (this looks promising, but it seems as if it does not work with the 0.20.2 API)

Upvotes: 1

Views: 1781

Answers (1)

Nishant Nagwani
Nishant Nagwani

Reputation: 1270

If you cannot find the solution to your problem (passing two long values from mapper to reducer in your specific case) using counters, another approach can be taking advantage of the order inversion pattern.

In this pattern, what you do is emit an extra key-value pair from map, with the key being something, which becomes the first key reducer receives (taking advantage of the fact that reducer receives keys in sorted order). For example, if the keys you are emitting are numeric values from 1 to 1000. Your dummy key could be "0". Since reducer receives the keys in sorted order, it is guarenteed to process the dummy key before any other key.

You additionaly have SetUp() and CloseUp() methods in the new API (there are similar methods in the old API too but I don't remember the name) to take advantage of the fact that they only execute exactly once on each node, before/after all the map/reduce tasks on that node start/finish.

Upvotes: 1

Related Questions