Ahmed
Ahmed

Reputation: 65

Change default configuration on Hadoop slave nodes?

Currently I am trying to pass some values through command line arguments and then parse it using GenericOptionsParser with tool implemented.

from the Master node I run something like this:

bin/hadoop jar MYJAR.jar MYJOB -D mapred.reduce.tasks=13

But this only get applied on the Master!! Is there any way to make this applied on the slaves as well?

I use Hadoop 0.20.203.

Any help is appreciated.

Upvotes: 1

Views: 1038

Answers (1)

Praveen Sripati
Praveen Sripati

Reputation: 33495

But this only get applied on the Master!! Is there any way to make this applied on the slaves as well?

According to the "Hadoop : The Definitive Guide". Setting some of the property on the client side is of no use. You need to set the same in the configuration file. Note, that you can also create new properties in the configuration files and read them in the code using the Configuration Object.

Be aware that some properties have no effect when set in the client configuration. For example, if in your job submission you set mapred.tasktracker.map.tasks.maximum with the expectation that it would change the number of task slots for the tasktrackers running your job, then you would be disappointed, since this property only is only honored if set in the tasktracker’s mapred-site.html file. In general, you can tell the component where a property should be set by its name, so the fact that mapred.task.tracker.map.tasks.maximum starts with mapred.tasktracker gives you a clue that it can be set only for the tasktracker daemon. This is not a hard and fast rule, however, so in some cases you may need to resort to trial and error, or even reading the source.

You can also configure the environment of the Hadoop variables using the HADOOP_*_OPTS in the conf/hadoop-env.sh file.

Again, according to the "Hadoop : The Definitive Guide".

Do not confuse setting Hadoop properties using the -D property=value option to GenericOptionsParser (and ToolRunner) with setting JVM system properties using the -Dproperty=value option to the java command. The syntax for JVM system properties does not allow any whitespace between the D and the property name, whereas GenericOptionsParser requires them to be separated by whitespace.

JVM system properties are retrieved from the java.lang.System class, whereas Hadoop properties are accessible only from a Configuration object.

Upvotes: 3

Related Questions