Pratik Garg
Pratik Garg

Reputation: 847

Default block size in Hadoop 2.x

The default block size in Hadoop 2.x is 128MB. What is the problem with 64MB?

Upvotes: 3

Views: 4674

Answers (2)

ozw1z5rd
ozw1z5rd

Reputation: 3208

HDFS's block size are so huge to minimize seek time. The optimal block size depends on average file size, seek time and transfer rate.

The faster the disk, the bigger the data block but there is a limit.

To take advantage of data locality splits have the same size of data blocks, since we start a thread for each split, too big blocks reduce the parallelism. So the best is:

  1. Keep seek time low. ( --> increase block size on fast disk )
  2. Keep split not too low. ( --> decrease block size )
  3. Take advantage of data locality. ( --> keep split size as close as block size )

128MB is the today good choice for today disk speed and size and computing performance.

Upvotes: 2

Ravindra babu
Ravindra babu

Reputation: 38940

There are some reasons for increase in block size. It improves the performance if you are managing big Hadoop cluster of peta bytes of data.

if you are managing a cluster of 1 peta bytes, 64 MB block size results into 15+ million blocks, which is difficult for Namenode to manage efficiently.

Having a lot of blocks will also result in a lot of mappers during MapReduce execution.

Depending on your data requirement, you can fine tunedfs.blocksize

By properly setting your block size (64MB or 128 Mb or 256 MB or 512 MB) , you can acheive

  1. Improvement in Namenode performance
  2. Improvement in performance of Map reduce job since number of mappers is directly dependent on block size.

Refer to this link for more details.

Upvotes: 4

Related Questions