sivasg
sivasg

Reputation: 1

Need clarity on Hadoop block size in single Node cluster

I have a single Node Hadoop cluster version - 2.x. The block size i have set is 64 MB. I have an input file in HDFS of size 84 MB. Now, when i run the MR job, I see that there are 2 splits which is valid as 84 MB/64 MB ~ 2 and so 2 splits.

But when i run command "hadoop fsck -blocks" to see details of blocks, I see this.

Total size:    90984182 B
Total dirs:    16
Total files:   7
Total symlinks:                0
Total blocks (validated):      7 (avg. block size 12997740 B)
Minimally replicated blocks:   7 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       0 (0.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    1
Average block replication:     1.0
Corrupt blocks:                0
Missing replicas:              0 (0.0 %)
Number of data-nodes:          1
Number of racks:               1

As you can see, the average block size is close to 13 MB. Why is this? ideally, the block size should be 64 MB rite?

Upvotes: 0

Views: 562

Answers (2)

user3067180
user3067180

Reputation: 11

The maximum block size is 64MB as you specified, but you'd have to be pretty lucky to have your average block side be equal to the maximum block size.

Consider the one file you mentioned:
1 file, 84 MB
84MB/64MB = 2 Blocks
84MB/2 Blocks = 42 MB/block on average

You must have some other files bringing the average down even more.

Other than the memory requirement on the namenode for the blocks and possibly loss of parallelism if your block size is too high (obviously not an issue in a single-node cluster), there isn't too much of a problem with the average block size being smaller than the max.

Having 64MB max block size does not mean every block takes up 64MB on disk.

Upvotes: 1

user3810043
user3810043

Reputation: 166

When you configure the block size you set the maximum size a block can be. It is highly unlikely that your files are an exact multiple of the block size so many blocks will be smaller than the configured block size.

Upvotes: 0

Related Questions