Reputation: 6706
I am learning hadoop. I have started with the classic wordcount example.
I have been using this repo: https://github.com/m-semnani/bd-infra (Although, I only need the hadoop part as of now.)
I ran the program with small amount of data.
My query is how do I know, if I need more datanodes in case of larger data.
Can I set some rules like.. if CPU / memory / storage goes above a particular limit.. I may need to deploy one more replica of datanode (OR namenode)
What can be the correct approach for this purpose ?
Upvotes: 0
Views: 100
Reputation: 191831
General rule of thumb for HDFS is when the total cluster capacity is beyond 80%, then it is time to expand, compact, or remove data.
However, this is not the only indicator of performance because after expanding, and adding more data, your NameNode heap and filecount start to become a concern, and at that point, you need to look at NameNode Federation (not replicas) rather than simple HDFS cluster expansion.
Upvotes: 1