Reputation: 21
Suppose you only have 1GB heap size which can be used for each mapper, however, the block size is set to be 10 GB and each split is 10GB. How the mapper read the large individual split?
Will the mapper buffer the input into disk and process the input split in a round-robin fashion?
Thanks!
Upvotes: 1
Views: 201
Reputation: 3171
The overall pattern of a mapper is quite simple:
while not end of split
(key, value) = RecordReader.next()
(keyOut, valueOut) = map(key, value)
RecordWriter.write(keyOut, valueOut)
Usually the first two operations only care about the size of the record. For example when TextInputFormat
is asked for the next line it stores the bytes in a buffer until the next end of line is found. Then the buffer is cleared. Etc.
The map implementation is up to you. If you don't store things in your mapper then you are fine. If you want it to be stateful, then you can be in trouble. Make sure that your memory consumption is bounded.
In the last step the keys and values written by your mapper are stored in memory. They are then partitioned and sorted. If the in-memory buffer becomes full, then its content is spilled to disk (it will eventually be anyway because reducers need to be able to download the partition file even after the mapper vanished).
So the answer to your question is: yes it will be fine.
What could cause trouble is:
If you want to learn more, here are a few entry points:
TextInputFormat
Upvotes: 2