How splitting the input file is done in the MapReduce framework?

Question

Assume a map-reduce job with m mappers which is fed by an input file F. apparently the mapreduce framework splits F into chunks (64 MB as default value) and feeds each chunk to a mapper. My question is, if I run this mapreduce job a couple of times, is the way chunks are formed the same in all of them? That is, the points from which the mapreduce framework split F remains the same or it may differ?

As an example, assume F contains the following lines:

1,2

3,5

5,6

7,6

5,5

7,7

in the first run the mapreduce forms two chunks as follows:

Chunk 1:

1,2

3,5

5,6

Chunk 2:

7,6

5,5

7,7

My question is whether the way the split is done remains the same if I run it again?

Besides, does each chunk have a unique name that can be used in the mapper?

How splitting the input file is done in the MapReduce framework?

Answers (1)

Related Questions