Reputation: 1174
I want to know how to set the number of
NameNodes
DataNodes
Mappers
Reducers
in code/configuration of Hadoop.
Upvotes: 1
Views: 824
Reputation: 38950
Namenode and DataNode count is decided by your business requirements. You don't set them by programming.
If you need scalability, you have to look into concepts of HDFS federation.
Refer to this documentation page for more details about Federation.
In order to scale the name service horizontally, federation uses multiple independent Namenodes/namespaces. The Namenodes are federated; the Namenodes are independent and do not require coordination with each other. The Datanodes are used as common storage for blocks by all the Namenodes.
Number of mappers is decided by input splits.
You can set the number of reducers programatically but framwork is not obligated to obey your recommendation.
So it's better to keep the decision to Hadoop to take decision on number of Mappers and Reducers.
Have a look at this related SE question:
How hadoop decides how many nodes will do map and reduce tasks
EDIT:
Hadoop cluster size : 1. Identify data requriements from your business needs 2. Identify replicaiton factor for your data 3. Calculate data exlposion rate in coming years 4. Once you have above data, you can think on ideal cluster size and hardware requirements for Namenode and Datanode.
Refer to this cloudera article for more details.
The right level of parallelism for maps seems to be around 10-100 maps per-node" node here means NameNode or DataNode?
it's Datanode.
when talking about Mappers some says same number as splits, another says same number of blocks, while others say it is determined by the framework
it was decided by hadoop framework depending on number of input splits.
Have a look at related SE question:
How does Hadoop perform input splits?
Upvotes: 1