map task input data

Question

I am new to map/reduce. Is it possible that input of one map task be on different serves? Assume I want to simulate "word count" using map/reduce and I split data line by line(each line one piece). Is it true that each map task will refer to one piece of data and count the number of occurrence of each word in that piece?

RGC · Accepted Answer

The input file will be split based on the hdfs block size, and exactly one map task will be spawned for each of this split.

For example, by default, the hdfs block size is 64mb. Lets say your input file is of size 50mb. when you load this file into hdfs, it will be split into 2 splits of each 25mb. Hence 2 map tasks will be spawned to work on each input split. Let assume that one input split has 100 lines, then the mapper class(task) will call the map method 100 times, one for each of the line.

map task input data

Answers (2)

Related Questions