ablimit
ablimit

Reputation: 2361

Hadoop node & core allocation strategy

I have a cluster with 50 nodes and each node has 8 cores for computation. If I have job to which I'm planning to impose 200 reducers, what would be good computational resource allocation strategy for better performance ?

I mean is it better to allocate 50 nodes and 4 cores on each of them or to allocate 25 nodes and 8 cores for each of them ? Which one is better in what case ?

Upvotes: 1

Views: 322

Answers (2)

Donald Miner
Donald Miner

Reputation: 39893

To answer your question, it depends on a few things. The 50 nodes are going to be better in general, in my opinion:

  • If you are reading a lot of data off disk, 50 nodes will be better because you will parallelize the loading off disk 2x.
  • If you are computing and processing over a lot of data, 50 nodes will be better, because the number of cores doesn't scale 1:1 with processing (i.e., 2x as many cores is not quite 2x as fast... meanwhile, more processors does scale close to 1:1).
  • Hadoop has to run things like the TaskTracker and DataNode processes on those nodes, as well as the OS layer stuff. Those "take up" cores, as well.

However, if your main concern is network, here are the few downsides of having 50 nodes:

  • Likely, 50 nodes is going to be over two racks. Are they on a flat network or do you have to deal with iter-rack communication? You'll have to set up Hadoop accordingly;
  • A network switch supporting 50 nodes is going to be more expensive than one that supports 25;
  • The network shuffle between the map and the reduce will cause the switch a bit more work for your 50 node cluster, but still about the same amount of data will be passed through the network.

Even with these network concerns, I think you'll find that the 50 nodes is better, just because the value of a node is not just the number of cores. You have to consider mostly how many disks you have.

Upvotes: 1

Thomas Jungblut
Thomas Jungblut

Reputation: 20969

It is hard to say, usually it is always "the higher the better". More machines would be better to prevent failure.

Usually Hadoop is fine with commodity hardware and you can pick the 50 4 cores each servers.

But I would pick the 8 cores if they would have superior hardware, e.G. higher CPU frequency, DDR3 RAM or 10k rpm disks.

Upvotes: 1

Related Questions