Reputation: 211
How can you pass a small number of metadata collected in the Mapper to the Reducer? In my specific problem, I only want to pass two long values, so I wouldn't use MultipleOutputFormat or MultipleOutputs for these.
Some variants I have tried:
(1)
Mapper
context.getCounter("Countergroup", "Counter").increment(1);
Reducer
counter = context.getCounter("Countergroup", "Counter").getValue();
Counters are not updated regularly, so the function call in the Reducer results in a 0 value.
(2)
Mapper
context.getConfiguration().setInt("Counter", countTotal);
Reducer
counter = context.getConfiguration().getInt("Counter", 0);
Certainly Configurations can not be changed during a running job (was worth trying).
There have already been questions about this problem, but I could not find a working answer. Also, the API has changed. I am using Hadoop 0.20.2 .
Similar questions:
Passing values from Mapper to Reducer
Accessing a mapper's counter from a reducer (this looks promising, but it seems as if it does not work with the 0.20.2 API)
Upvotes: 1
Views: 1781
Reputation: 1270
If you cannot find the solution to your problem (passing two long values from mapper to reducer in your specific case) using counters, another approach can be taking advantage of the order inversion pattern.
In this pattern, what you do is emit an extra key-value pair from map, with the key being something, which becomes the first key reducer receives (taking advantage of the fact that reducer receives keys in sorted order). For example, if the keys you are emitting are numeric values from 1 to 1000. Your dummy key could be "0". Since reducer receives the keys in sorted order, it is guarenteed to process the dummy key before any other key.
You additionaly have SetUp() and CloseUp() methods in the new API (there are similar methods in the old API too but I don't remember the name) to take advantage of the fact that they only execute exactly once on each node, before/after all the map/reduce tasks on that node start/finish.
Upvotes: 1