Reputation: 1474
I need a variable that shared between reduce tasks and each of reduce tasks can read and write on it atomically. The reason that I need such a variable is to give a unique identifier to each files that created by reduce task (number of files which created by reduce tasks is not deterministic).
Thanks
Upvotes: 0
Views: 1229
Reputation: 10652
All the outout files produced by the reducers already have unique names part-r-00001 and such. There is a partition number you can read in case you need that number from your code.
Centralized counters that must be guaranteed unique break a lot of the scalability of Hadoop.
So if you need something different then I would use something like a Sha1 of the task id of the reducer to get something that is unique over multiple jobs.
Upvotes: 0
Reputation: 8088
In my understanding ZooKeeper is specially built to maintain atomic access to the cluster wide variables.
Upvotes: 1
Reputation: 20969
I would recommend using FileSystem.createNewFile()
.
Have a look here:
Upvotes: 0