Reputation: 57
Is there a way to fine tune Hadoop configuration parameters without having to run tests for every possible combination? I am currently working on an 8 nodes cluster and I want to optimize the performances of map reduce task as well as spark performance (running on top of hdfs).
Upvotes: 1
Views: 696
Reputation: 3642
The short answer is NO. You need to play around and run smoke tests to determine optimal performance for your cluster. So I would start by checking out these
Links:
Some topics discussed that will effect MapReduce jobs:
To give you an idea of how a 4 node 32 core 128GB RAM per node cluster is set up in YARN/TEZ: (From Hadoop multinode cluster too slow. How do I increase speed of data processing?)
For Tez: Divide RAM/CORES = Max TEZ Container size So in my case: 128/32 = 4GB
YARN:
I like to run max RAM I can spare per node with YARN, mine is a little higher than recommendations, but the recommended values cause crashes in TEZ/MR jobs so 76GB works better my case. You need to play with all these values!
Upvotes: 1