Reputation: 23
As per my knowledge,one mapper will be allocated for one split.
But what happens when I put Split size is greater than the actual Block size?
For example: If I put Block size = 128 Mb and Split Size = 130 Mb ,in these case how many mappers will run. Is it one mapper or more than one mapper?
Upvotes: 2
Views: 789
Reputation: 163
If an InputSplit exceeds the HDFS Block Size, the mapper ends up reading data from multiple blocks.
In your example, if Block Size = 128 MB and Calculated Split Size = 130MB, one Map task will be generated which will read from two different blocks.
How exactly these two blocks are read is abstracted by HDFS layer.
Upvotes: 1
Reputation: 679
It is available to set split size more than block size. But in this case to get one split mapper should read several blocks from hdfs, which can cause network transfer, because block n and block n+1 may not be located in one datanode.
In your example if you set splitsize=130mb and your input data is one 130mb file, then you will have 1 mapper.
Upvotes: 0