Reputation: 1474

shared variable in map reduce

I need a variable that shared between reduce tasks and each of reduce tasks can read and write on it atomically. The reason that I need such a variable is to give a unique identifier to each files that created by reduce task (number of files which created by reduce tasks is not deterministic).

Thanks

Upvotes: 0

Answers (3)

Niels Basjes

Reputation: 10652

All the outout files produced by the reducers already have unique names part-r-00001 and such. There is a partition number you can read in case you need that number from your code.

Centralized counters that must be guaranteed unique break a lot of the scalability of Hadoop.

So if you need something different then I would use something like a Sha1 of the task id of the reducer to get something that is unique over multiple jobs.

Upvotes: 0