Reputation: 402
We have dfs.blocksize set to 512MB for one of the map reduce jobs which is a map only job. But, some of the mappers are outputting more than 512 MB. ex: 512.9 MB.
I believe, the mapper block size should be restrained by the dfs.blocksize. Appreciate any inputs. Thanks
Upvotes: 0
Views: 158
Reputation: 72
Mappers do not save their outputs in HDFS - they use regular file systems for saving results - this is done to not replicate temporary data accross server in HDFS cluster. So, HDFS block size has nothign to do with mappers' output file size.
Upvotes: 1
Reputation: 35405
I believe, the mapper block size should be restrained by the dfs.blocksize.
This is not true. Files can be larger than block size. They'll just span multiple blocks in that case.
Upvotes: 1