Reputation: 45
I need some help about how it is possible to get the correct number of Map and Reduce tasks in my application. Is there any way to discover this number?
Thanks
Upvotes: 2
Views: 3426
Reputation: 165
The number of mappers depends on the file block size in HDFS (by default) and input split size (If we specify other than default).
If suppose you have 128MB file is there and hdfs block size is 64MB then a number of map task will be 2 because of default behaviour.
And if your input split size is 32MB but hdfs block size is 64MB then that time number of map task will be 4. So, map task depends on the all three factor defined above.
The number of reduce task depends on
conf.seNumReduceTask(num)
ormapreduce.job.reduces
(mapred.reduce.tasks
is deprecated).
Upvotes: 3
Reputation: 18270
It is not possible to get the actual number of map and reduce tasks for an application before its execution, since the factors of task failures followed by re-attempts and speculative execution attempts cannot be accurately determined prior to execution, an approximate number tasks can be derived.
The total number of Map tasks for a MapReduce job depends on its Input files and their FileFormat.
For each input file, splits are computed and one map task per input split will be invoked.
The split size will be calculated based on,
input_split_size = max(mapreduce.input.fileinputformat.split.minsize, min(mapreduce.input.fileinputformat.split.maxsize, dfs.blocksize))
If the properties
mapreduce.input.fileinputformat.split.minsize
mapreduce.input.fileinputformat.split.maxsize
are at their default, the input split size for a file will be approximately equal to its blocksize
considering the file is splittable.
The total number of map tasks will be equal to sum of number of input splits per file.
The total number of reduce tasks, it is 1
(default) or equal to mapreduce.job.reduces
.
Upvotes: 3
Reputation: 1006
Number of map task is equal to the number of input splits in any job you can find any one of them to find the number of mapper and number of reducers you can set explicitly. Moreover, once you run the map reduce job you can observe generated logs to find out number of mappers and reducers in your job.
Upvotes: 1