Reputation: 109
I am just curious why the hadoop map spill size which is io.sort.mb is 100 MB(default) when the block size is 128 MB. Wouldn't it make more sense to set it equal to the block size since a map task is anyways going to process that much data? Of course I understand that there can be issue assigning more RAM here but is there anything more to it?
Upvotes: 1
Views: 209
Reputation: 1967
io.sort.mb is the total amount of buffer memory needed to sort files in memory. As a ideal rule of thumb it should always be set not more than 70 % of total RAM. Block size is basically about setting file chunk size in a disk.You can very well relate input splits to HDFS block size.
Have a look at this post to get a better idea
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201104.mbox/%[email protected]%3E
Upvotes: 1