I am confused by hadoop namenode memory problem. when namenode memory usage is higher than a certain percentage (say 75%), reading and writing hdfs files through hadoop api will fail (for example, call some open() will throw exception), what is the reason? Does anyone has the same thing? PS.This time the namenode disk io is not high, the CPU is relatively idle. what determines namenode'QPS (Query Per Second) ? Thanks very much!

memoryhadoopdistributed-computing

jun zhou

Reputation: 133

Hadoop namenode memory usage

I am confused by hadoop namenode memory problem.

when namenode memory usage is higher than a certain percentage (say 75%), reading and writing hdfs files through hadoop api will fail (for example, call some open() will throw exception), what is the reason? Does anyone has the same thing? PS.This time the namenode disk io is not high, the CPU is relatively idle.
what determines namenode'QPS (Query Per Second) ?

Thanks very much!

Upvotes: 2

Answers (1)

Thomas Jungblut

Reputation: 20969

Since the namenode is basically just a RPC Server managing a HashMap with the blocks, you have two major memory problems:

Java HashMap is quite costly, its collision resolution (seperate chaining algorithm) is costly as well, because it stores the collided elements in a linked list.
The RPC Server needs threads to handle requests- Hadoop ships with his own RPC framework and you can configure this with the dfs.namenode.service.handler.count for the datanodes it is default set to 10. Or you can configure this dfs.namenode.handler.count for other clients, like MapReduce jobs, JobClients that want to run a job. When a request comes in and it want to create a new handler, it go may out of memory (new Threads are also allocating a good chunk of stack space, maybe you need to increase this).

So these are the reasons why your namenode needs so much memory.

What determines namenode'QPS (Query Per Second) ?

I haven't benchmarked it yet, so I can't give you very good tips on that. Certainly fine tuning the handler counts higher than the number of tasks that can be run in parallel + speculative execution. Depending on how you submit your jobs, you have to fine tune the other property as well.

Of course you should give the namenode always enough memory, so it has headroom to not fall into full garbage collection cycles.

Upvotes: 1

Hadoop namenode memory usage

Answers (1)

Related Questions