Hadoop memory usage: reduce container is running beyond physical memory limits

Question

I have simple mappers and following simple reducer (it is joining of two large tables by one field):

protected void reduce(StringLongCompositeKey key, Iterable values, Context context) 
            throws IOException, InterruptedException {}
    foreach(Text text : values) {
        // do some operations with one record and then emit it using context.write
        // so nothing is storing in memory, one text record is small (mo more then 1000 chars)
    }
}

but I got following error

14/09/25 17:54:59 INFO mapreduce.Job: map 100% reduce 28%

14/09/25 17:57:14 INFO mapreduce.Job: Task Id : attempt_1410255753549_9772_r_000020_0, Status : FAILED

Container [pid=24481,containerID=container_1410255753549_9772_01_001594] is running beyond physical memory limits. Current usage: 4.1 GB of 4 GB physical memory used; 4.8 GB of 8.4 GB virtual memory used. Killing container.

There is one nuance -)

Iterable values

is very long! As I considered before, and still believe that it is true, that that Iterable loads next record on demand, and it shouldn't be problem for hadoop to process it, without consumption a lot of RAM.

Could this error appear while shuffling or sorting? Is there any special information about processing long sequences?

Nonnib · Accepted Answer

Could this error appear while shuffling or sorting?

Indeed. This seems to be happening in the shuffle phase when data is being moved to the reducers, before your code actually runs.

The way the reduce percentages work is that 0-33% is the shuffle phase, where data is sent to the reducers, 33-66% is the sort phase and the last 33% represents the containers running.

Hadoop memory usage: reduce container is running beyond physical memory limits

Answers (2)

Related Questions