mephistopheies
mephistopheies

Reputation: 79

Hadoop memory usage: reduce container is running beyond physical memory limits

I have simple mappers and following simple reducer (it is joining of two large tables by one field):

protected void reduce(StringLongCompositeKey key, Iterable<Text> values, Context context) 
            throws IOException, InterruptedException {}
    foreach(Text text : values) {
        // do some operations with one record and then emit it using context.write
        // so nothing is storing in memory, one text record is small (mo more then 1000 chars)
    }
}

but I got following error

14/09/25 17:54:59 INFO mapreduce.Job: map 100% reduce 28%

14/09/25 17:57:14 INFO mapreduce.Job: Task Id : attempt_1410255753549_9772_r_000020_0, Status : FAILED

Container [pid=24481,containerID=container_1410255753549_9772_01_001594] is running beyond physical memory limits. Current usage: 4.1 GB of 4 GB physical memory used; 4.8 GB of 8.4 GB virtual memory used. Killing container.

There is one nuance -)

Iterable<Text> values

is very long! As I considered before, and still believe that it is true, that that Iterable loads next record on demand, and it shouldn't be problem for hadoop to process it, without consumption a lot of RAM.

Could this error appear while shuffling or sorting? Is there any special information about processing long sequences?

Upvotes: 1

Views: 2269

Answers (2)

Jeroen Vuurens
Jeroen Vuurens

Reputation: 1251

It appears the shuffle sort is running out of memory. You could check your configuration to see how you have allocated memory. By using the java.opts, you can make sure the java heap of the reducer is not going to claim all memory, since it also needs memory for OS and core processes. As a rule of thumb, I leave 512MB for these. The out of memory on the shuffle sort can have something to do with the shuffle sort competing for memory. Lowering the percentage you allow the shuffle to use often solves the problem. Ofc, the best settings depend on your setup.

mapreduce.reduce.memory.mb=4096 
mapreduce.reduce.java.opts="-server
-Xmx3584m -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true"
mapreduce.reduce.shuffle.input.buffer.percent=0.2

Upvotes: 0

Nonnib
Nonnib

Reputation: 467

Could this error appear while shuffling or sorting?

Indeed. This seems to be happening in the shuffle phase when data is being moved to the reducers, before your code actually runs.

The way the reduce percentages work is that 0-33% is the shuffle phase, where data is sent to the reducers, 33-66% is the sort phase and the last 33% represents the containers running.

Upvotes: 2

Related Questions