Guillaume
Guillaume

Reputation: 2879

OOM in tez/hive

[After a few answers and comments I asked a new question based on the knowledge gained here: Out of memory in Hive/tez with LATERAL VIEW json_tuple ]

One of my query consistently fails with the error:

ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1516602562532_3606_2_03, diagnostics=[Task failed, taskId=task_1516602562532_3606_2_03_000001, diagnostics=[TaskAttempt 0 failed, info=[Container container_e113_1516602562532_3606_01_000008 finished with diagnostics set to [Container failed, exitCode=255. Exception from container-launch.
Container id: container_e113_1516602562532_3606_01_000008
Exit code: 255
Stack trace: ExitCodeException exitCode=255: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:933)
    at org.apache.hadoop.util.Shell.run(Shell.java:844)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 255
]], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)

The keyword here seems to be java.lang.OutOfMemoryError: Java heap space.

I looked around but none of what I thought I understood from Tez helps me:

​My query has 4 mappers, 3 go very fast, the 4th dies everytime. Here is the Tez graphical view of the query:

tez graphical view

From this image:

e and contact are partitioned and only one partition in selected in the WHERE clause.

I thus tried to increase the number of maps:

If it's relevant, here are some other memory settings:

My understanding was that tez can split the work in many loads, thus taking long but eventually completing. ​Am I wrong, or is there a way I have not found?

context: hdp2.6, 8 datanodes with 32GB Ram, query using a chunky lateral view based on json run via beeline.

Upvotes: 0

Views: 2414

Answers (2)

user3123372
user3123372

Reputation: 744

I had the same issue and increasing all the memory parameter didnt help.

Then I switched to MR and got the below error.

Failed with exception Number of dynamic partitions created is 2795, which is more than 1000.

After setting the higher value I returned back to tez, and the problem was solved.

Upvotes: 0

BalaramRaju
BalaramRaju

Reputation: 439

The issue is clearly due to SKEWED data. I would recommand that you add DISTRIBUTE BY COL to you select query from source so that the reducer has evenly distributed data. In the below example COL3 is more evenly distributed data like ID column Example

ORIGINAL QUERY : insert overwrite table X AS SELECT COL1,COL2,COL3 from Y
NEW QUERY      : insert overwrite table X AS SELECT COL1,COL2,COL3 from Y distribute by COL3

Upvotes: 1

Related Questions