Leonardo Steil
Leonardo Steil

Reputation: 45

Number of MapReduce tasks

I need some help about how it is possible to get the correct number of Map and Reduce tasks in my application. Is there any way to discover this number?

Thanks

Upvotes: 2

Views: 3426

Answers (3)

Hari Singh
Hari Singh

Reputation: 165

The number of mappers depends on the file block size in HDFS (by default) and input split size (If we specify other than default).

If suppose you have 128MB file is there and hdfs block size is 64MB then a number of map task will be 2 because of default behaviour.

And if your input split size is 32MB but hdfs block size is 64MB then that time number of map task will be 4. So, map task depends on the all three factor defined above.

The number of reduce task depends on conf.seNumReduceTask(num) or mapreduce.job.reduces (mapred.reduce.tasks is deprecated).

Upvotes: 3

franklinsijo
franklinsijo

Reputation: 18270

It is not possible to get the actual number of map and reduce tasks for an application before its execution, since the factors of task failures followed by re-attempts and speculative execution attempts cannot be accurately determined prior to execution, an approximate number tasks can be derived.

The total number of Map tasks for a MapReduce job depends on its Input files and their FileFormat.
For each input file, splits are computed and one map task per input split will be invoked.

The split size will be calculated based on,

input_split_size = max(mapreduce.input.fileinputformat.split.minsize, min(mapreduce.input.fileinputformat.split.maxsize, dfs.blocksize))

If the properties

  • mapreduce.input.fileinputformat.split.minsize

  • mapreduce.input.fileinputformat.split.maxsize

    are at their default, the input split size for a file will be approximately equal to its blocksize considering the file is splittable.

The total number of map tasks will be equal to sum of number of input splits per file.
The total number of reduce tasks, it is 1 (default) or equal to mapreduce.job.reduces.

Upvotes: 3

siddhartha jain
siddhartha jain

Reputation: 1006

Number of map task is equal to the number of input splits in any job you can find any one of them to find the number of mapper and number of reducers you can set explicitly. Moreover, once you run the map reduce job you can observe generated logs to find out number of mappers and reducers in your job.

Upvotes: 1

Related Questions