Shahryar
Shahryar

Reputation: 1474

shared variable in map reduce

I need a variable that shared between reduce tasks and each of reduce tasks can read and write on it atomically. The reason that I need such a variable is to give a unique identifier to each files that created by reduce task (number of files which created by reduce tasks is not deterministic).

Thanks

Upvotes: 0

Views: 1229

Answers (3)

Niels Basjes
Niels Basjes

Reputation: 10652

All the outout files produced by the reducers already have unique names part-r-00001 and such. There is a partition number you can read in case you need that number from your code.

Centralized counters that must be guaranteed unique break a lot of the scalability of Hadoop.

So if you need something different then I would use something like a Sha1 of the task id of the reducer to get something that is unique over multiple jobs.

Upvotes: 0

David Gruzman
David Gruzman

Reputation: 8088

In my understanding ZooKeeper is specially built to maintain atomic access to the cluster wide variables.

Upvotes: 1

Thomas Jungblut
Thomas Jungblut

Reputation: 20969

I would recommend using FileSystem.createNewFile().

Have a look here:

http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/fs/FileSystem.html#createNewFile%28org.apache.hadoop.fs.Path%29

Upvotes: 0

Related Questions