Jack
Jack

Reputation: 5880

what's the actual ideal NameNode memory size when meet a lot files in HDFS

I will have 200 million files in my HDFS cluster, we know each file will occupy 150 bytes in NameNode memory, plus 3 blocks so there are total 600 bytes in NN. So I set my NN memory having 250GB to well handle 200 Million files. My question is that so big memory size of 250GB, will it cause too much pressure on GC ? Is it feasible that creating 250GB Memory for NN.

Can someone just say something, why no body answer??

Upvotes: 4

Views: 323

Answers (2)

Ani Menon
Ani Menon

Reputation: 28199

Ideal name node memory size is about total space used by meta of the data + OS + size of daemons and 20-30% space for processing related data.

You should also consider the rate at which data comes in to your cluster. If you have data coming in at 1TB/day then you must consider a bigger memory drive or you would soon run out of memory.

Its always advised to have at least 20% memory free at any point of time. This would help towards avoiding the name node going into a full garbage collection. As Marco specified earlier you may refer NameNode Garbage Collection Configuration: Best Practices and Rationale for GC config.

In your case 256 looks good if you aren't going to get a lot of data and not going to do lots of operations on the existing data.

Refer: How to Plan Capacity for Hadoop Cluster?

Also refer: Select the Right Hardware for Your New Hadoop Cluster

Upvotes: 2

Marco99
Marco99

Reputation: 1659

You can have a physical memory of 256 GB in your namenode. If your data increase in huge volumes, consider hdfs federation. I assume you already have multi cores ( with or without hyperthreading) in the name node host. Guess the below link addresses your GC concerns: https://community.hortonworks.com/articles/14170/namenode-garbage-collection-configuration-best-pra.html

Upvotes: 2

Related Questions