Zombie
Zombie

Reputation: 402

Hadoop - can mapper output exceed block size

We have dfs.blocksize set to 512MB for one of the map reduce jobs which is a map only job. But, some of the mappers are outputting more than 512 MB. ex: 512.9 MB.

I believe, the mapper block size should be restrained by the dfs.blocksize. Appreciate any inputs. Thanks

Upvotes: 0

Views: 158

Answers (2)

alex-arkhipov
alex-arkhipov

Reputation: 72

Mappers do not save their outputs in HDFS - they use regular file systems for saving results - this is done to not replicate temporary data accross server in HDFS cluster. So, HDFS block size has nothign to do with mappers' output file size.

Upvotes: 1

Hari Menon
Hari Menon

Reputation: 35405

I believe, the mapper block size should be restrained by the dfs.blocksize.

This is not true. Files can be larger than block size. They'll just span multiple blocks in that case.

Upvotes: 1

Related Questions