user9332151
user9332151

Reputation: 57

Tuning Hadoop parameters

Is there a way to fine tune Hadoop configuration parameters without having to run tests for every possible combination? I am currently working on an 8 nodes cluster and I want to optimize the performances of map reduce task as well as spark performance (running on top of hdfs).

Upvotes: 1

Views: 696

Answers (1)

Petro
Petro

Reputation: 3642

The short answer is NO. You need to play around and run smoke tests to determine optimal performance for your cluster. So I would start by checking out these

Links:

Some topics discussed that will effect MapReduce jobs:

  • Configure HDFS block size for optimal performance
  • Avoid file sizes that are smaller than a block size
  • Tune DataNode JVM for optimal performance
  • Enable HDFS short circuit reads
  • Avoid reads or write from stale DataNodes

To give you an idea of how a 4 node 32 core 128GB RAM per node cluster is set up in YARN/TEZ: (From Hadoop multinode cluster too slow. How do I increase speed of data processing?)

For Tez: Divide RAM/CORES = Max TEZ Container size So in my case: 128/32 = 4GB

TEZ: enter image description here


YARN:

I like to run max RAM I can spare per node with YARN, mine is a little higher than recommendations, but the recommended values cause crashes in TEZ/MR jobs so 76GB works better my case. You need to play with all these values!

enter image description here

Upvotes: 1

Related Questions