Why double amount of memory is used for Name Node files?

Question

the Cloudera blog or in hortonwork forum I read::

"Every file, directory and block in HDFS is represented as an object in the namenode’s memory, each of which occupies 150 bytes, as a rule of thumb. So 10 million files, each using a block, would use about 3 gigabytes of memory"

BUT:

10000000 * 150 = 1500000000 byte = 1.5 GB.

Looks like For 3GB I need to allocate 300 bytes. I don't understand why 300 bytes are used for each file instead of 150? It's just NameNode. There should not be any replication factor.

Thanks

gudok · Accepted Answer

For every small file, namenode needs to store two objects in memory: per-file object and per-block object. This results in approximately 300 bytes per single file.

Why double amount of memory is used for Name Node files?

Answers (1)

Related Questions