Reputation: 23

What happen,when I set Split size is greater than the actual Block size in mapreduce framework?

As per my knowledge,one mapper will be allocated for one split.

But what happens when I put Split size is greater than the actual Block size?

For example: If I put Block size = 128 Mb and Split Size = 130 Mb ,in these case how many mappers will run. Is it one mapper or more than one mapper?

Upvotes: 2

Answers (2)

Avinash Ganta

Reputation: 163

If an InputSplit exceeds the HDFS Block Size, the mapper ends up reading data from multiple blocks.
In your example, if Block Size = 128 MB and Calculated Split Size = 130MB, one Map task will be generated which will read from two different blocks.

How exactly these two blocks are read is abstracted by HDFS layer.

Upvotes: 1

fi11er

Reputation: 679

It is available to set split size more than block size. But in this case to get one split mapper should read several blocks from hdfs, which can cause network transfer, because block n and block n+1 may not be located in one datanode.

In your example if you set splitsize=130mb and your input data is one 130mb file, then you will have 1 mapper.

Upvotes: 0

What happen,when I set Split size is greater than the actual Block size in mapreduce framework?

Answers (2)

Related Questions