Reputation: 45
I was wondering what is the standard practice for reading Java properties files in MapReduce applications and how to pass the location to it when submitting (starting) a job. In regular Java applications you can pass the location to the properties file as a JVM system property (-D) or argument to main method. What is the best alternative (standard practice) for this for MapReduce jobs? Some good examples would be very helpful.
Upvotes: 1
Views: 1600
Reputation: 3154
The best alternative is to use DistributedCache
, however it may not be the standard way. There can be other ways. But I haven't seen any code using anything else so far.
The idea is to add the file to the cache, and read it inside setup
method of map/reduce and load values into a Properties
or a Map
. If you need snippet I can add.
Oh I can remember, my friend JtheRocker used another approach. He set entire contents of the file against a key in the Configuration
object, got it's value on setup
then parsing & loading the pairs in a Map
. In this case, file reading is done on the driver, which was previously on the task's side. While it's suitable for small files and seems cleaner, orthodox people may not like to pollute conf
at all.
I would like to see, what other posts bring out.
Upvotes: 2