Umesh Kacha
Umesh Kacha

Reputation: 13666

Hadoop Mapper: Appropriate input files size?

I have clusters HDFS block size is 64 MB. I have directory containing 100 plain text files, each of which is is 100 MB in size. The InputFormat for the job is TextInputFormat. How many Mappers will run?

I saw this question in Hadoop Developer exam. Answer is 100. Other three answer options were 64, 640, 200. But I am not sure how 100 comes or answer is wrong.

Please guide. Thanks in advance.

Upvotes: 1

Views: 1683

Answers (3)

KAPIL TANDON
KAPIL TANDON

Reputation: 11

Each file would be split into two as the block size (64 MB) is less than the file size (100 MB), so 200 mappers would be running

Upvotes: 0

Chris White
Chris White

Reputation: 30089

I would agree with your assessment that this appears wrong

Unless of course there is more to the exam question not posted:

  • Are these 'plain' text files gzip compressed - in which case they are not splittable?)
  • The cluster split size may be 64MB, but what's the assigned split size of the input files - 128MB?

To be fair to the exam question and 'correct' answer we need the exam question in full entirety.

The correct answer should be 200 (if the file block sizes are all the default 64MB, and the files are either not compressed, or compressed with a splittable codec such as snappy)

Upvotes: 4

user1261215
user1261215

Reputation:

Looks like answer was wrong to me.

But it may be correct in below scenarios:

1) If we override isSplitable method and if we return false, then the number of map tasks will be same as number of input files. In this case it will be 100.

2) If we configure mapred.min.split.size, mapred.max.split.size variables.By default, min split size is 0 and max split size is Long.MAX.

Below is the function it uses to identify the number of mappers.

max(mapred.min.split.size, min(mapred.max.split.size, blocksize))

In this scenario, if we configure mapred.min.split.size as 100, Then we will have 100 mappers.

But according to given information, i think 100 is not right answer.

Upvotes: 0

Related Questions