ihadanny
ihadanny

Reputation: 4483

mapred-site.xml on the client machine has to be the same as the one in the hadoop cluster?

related to hadoop api configuration on the client machine.

If we try to keep the mapred-site.xml on the client machine as minimal as possible = specifying only mapred.job.tracker, then the mapred-default from inside the hadoop.jar takes over, and puts unwanted properties, e.g. mapred.tasktracker.map.tasks.maximum=2. Then these values are submitted with the task, and override those in the cluster config :(

what's the right approach here? do you replicate the files from your cluster into your client machine?

Upvotes: 0

Views: 861

Answers (1)

Praveen Sripati
Praveen Sripati

Reputation: 33545

the mapred-default from inside the hadoop.jar takes over, and puts unwanted properties, e.g. mapred.tasktracker.map.tasks.maximum=2. Then these values are submitted with ttask, and override those in the cluster config :(

I assume you are refering to the properties set in the job.xml file. There should be no effect of setting some of the properties like mapred.tasktracker.map.tasks.maximum on the client side, since the mapred.tasktracker.map.tasks.maximum property is read by the TaskTracker daemon at startup. Although mapred.tasktracker.map.tasks.maximum is specified in the job.xml, it's not job specific.

How did you verify that the properties have been overridden? Go to the JobTracker page (http://jotracker:50030/jobtracker.jsp) and verify that the particular property had been overridden or not for a TaskTracker.

what's the right approach here? do you replicate the files from your cluster into your client machine?

Just do avoid confusion, I would have separate files on the client and the nodes and have the minimum required configuration properties in them and let the other properties take the default values.

According to the Hadoop : The Definitive Guide

Be aware that some properties have no effect when set in the client configuration. For example, if in your job submission you set mapred.tasktracker.map.tasks.maximum with the expectation that it would change the number of task slots for the tasktrackers running your job, then you would be disappointed, since this property only is only honored if set in the tasktracker’s mapred-site.html file. In general, you can tell the component where a property should be set by its name, so the fact that mapred.task tracker.map.tasks.maximum starts with mapred.tasktracker gives you a clue that it can be set only for the tasktracker daemon. This is not a hard and fast rule, however, so in some cases you may need to resort to trial and error, or even reading the source.

Upvotes: 1

Related Questions