Reputation: 13666
I have clusters HDFS block size is 64 MB. I have directory containing 100 plain text files, each of which is is 100 MB in size. The InputFormat
for the job is TextInputFormat
. How many Mappers will run?
I saw this question in Hadoop Developer exam. Answer is 100. Other three answer options were 64, 640, 200. But I am not sure how 100 comes or answer is wrong.
Please guide. Thanks in advance.
Upvotes: 1
Views: 1683
Reputation: 11
Each file would be split into two as the block size (64 MB) is less than the file size (100 MB), so 200 mappers would be running
Upvotes: 0
Reputation: 30089
I would agree with your assessment that this appears wrong
Unless of course there is more to the exam question not posted:
To be fair to the exam question and 'correct' answer we need the exam question in full entirety.
The correct answer should be 200 (if the file block sizes are all the default 64MB, and the files are either not compressed, or compressed with a splittable codec such as snappy)
Upvotes: 4
Reputation:
Looks like answer was wrong to me.
But it may be correct in below scenarios:
1) If we override isSplitable method and if we return false, then the number of map tasks will be same as number of input files. In this case it will be 100.
2) If we configure mapred.min.split.size, mapred.max.split.size variables.By default, min split size is 0 and max split size is Long.MAX.
Below is the function it uses to identify the number of mappers.
max(mapred.min.split.size, min(mapred.max.split.size, blocksize))
In this scenario, if we configure mapred.min.split.size as 100, Then we will have 100 mappers.
But according to given information, i think 100 is not right answer.
Upvotes: 0