Reputation: 4483
related to hadoop api configuration on the client machine.
If we try to keep the mapred-site.xml on the client machine as minimal as possible = specifying only mapred.job.tracker, then the mapred-default from inside the hadoop.jar takes over, and puts unwanted properties, e.g. mapred.tasktracker.map.tasks.maximum=2. Then these values are submitted with the task, and override those in the cluster config :(
what's the right approach here? do you replicate the files from your cluster into your client machine?
Upvotes: 0
Views: 861
Reputation: 33545
the mapred-default from inside the hadoop.jar takes over, and puts unwanted properties, e.g. mapred.tasktracker.map.tasks.maximum=2. Then these values are submitted with ttask, and override those in the cluster config :(
I assume you are refering to the properties set in the job.xml file. There should be no effect of setting some of the properties like mapred.tasktracker.map.tasks.maximum
on the client side, since the mapred.tasktracker.map.tasks.maximum
property is read by the TaskTracker daemon at startup. Although mapred.tasktracker.map.tasks.maximum
is specified in the job.xml, it's not job specific.
How did you verify that the properties have been overridden? Go to the JobTracker page (http://jotracker:50030/jobtracker.jsp) and verify that the particular property had been overridden or not for a TaskTracker.
what's the right approach here? do you replicate the files from your cluster into your client machine?
Just do avoid confusion, I would have separate files on the client and the nodes and have the minimum required configuration properties in them and let the other properties take the default values.
According to the Hadoop : The Definitive Guide
Be aware that some properties have no effect when set in the client configuration. For example, if in your job submission you set mapred.tasktracker.map.tasks.maximum with the expectation that it would change the number of task slots for the tasktrackers running your job, then you would be disappointed, since this property only is only honored if set in the tasktracker’s mapred-site.html file. In general, you can tell the component where a property should be set by its name, so the fact that mapred.task tracker.map.tasks.maximum starts with mapred.tasktracker gives you a clue that it can be set only for the tasktracker daemon. This is not a hard and fast rule, however, so in some cases you may need to resort to trial and error, or even reading the source.
Upvotes: 1