Alex Raj Kaliamoorthy
Alex Raj Kaliamoorthy

Reputation: 2095

Hive Runtime Error: Map local work exhausted memory

I am trying to join two ORC tables in Hive but I get the an error. Here is the query:

select t1.num as num, t1.product as Product, t2.value as OldValue, t1.value as NewValue from test_new t1 LEFT OUTER JOIN test_old t2 ON t1.num=t2.num and t1.product=t2.product where t2.value is NULL and t1.value is not NULL or t1.value<>t2.value;

Error:

2017-05-29 11:19:27,157 INFO  [main]: mr.ExecDriver (SessionState.java:printInfo(911)) - Execution log at: /tmp/alex/kaliamoorthya_20170529111919_6621dd64-7a5e-4411-abda-b28fddab8bdc.log
2017-05-29 11:19:27,320 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(118)) - <PERFLOG method=deserializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
2017-05-29 11:19:27,321 INFO  [main]: exec.Utilities (Utilities.java:deserializePlan(953)) - Deserializing MapredLocalWork via kryo
2017-05-29 11:19:27,462 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(158)) - </PERFLOG method=deserializePlan start=1496056767320 end=1496056767462 duration=142 from=org.apache.hadoop.hive.ql.exec.Utilities>
2017-05-29 11:19:27,472 INFO  [main]: mr.MapredLocalTask (SessionState.java:printInfo(911)) - 2017-05-29 11:19:27   Starting to launch local task to process map join;  maximum memory = 1908932608
2017-05-29 11:19:27,549 INFO  [main]: mr.MapredLocalTask (MapredLocalTask.java:initializeOperators(441)) - fetchoperator for t2 created
2017-05-29 11:19:27,550 INFO  [main]: exec.TableScanOperator (Operator.java:initialize(346)) - Initializing Self TS[0]
2017-05-29 11:19:27,550 INFO  [main]: exec.TableScanOperator (Operator.java:initializeChildren(419)) - Operator 0 TS initialized
2017-05-29 11:19:27,550 INFO  [main]: exec.TableScanOperator (Operator.java:initializeChildren(423)) - Initializing children of 0 TS
2017-05-29 11:19:27,550 INFO  [main]: exec.HashTableSinkOperator (Operator.java:initialize(458)) - Initializing child 1 HASHTABLESINK
2017-05-29 11:19:27,550 INFO  [main]: exec.HashTableSinkOperator (Operator.java:initialize(346)) - Initializing Self HASHTABLESINK[1]
2017-05-29 11:19:27,551 INFO  [main]: mapjoin.MapJoinMemoryExhaustionHandler (MapJoinMemoryExhaustionHandler.java:<init>(61)) - JVM Max Heap Size: 1908932608
2017-05-29 11:19:27,582 INFO  [main]: persistence.HashMapWrapper (HashMapWrapper.java:calculateTableSize(94)) - Key count from statistics is -1; setting map size to 100000
2017-05-29 11:19:27,582 INFO  [main]: exec.HashTableSinkOperator (Operator.java:initialize(394)) - Initialization Done 1 HASHTABLESINK
2017-05-29 11:19:27,582 INFO  [main]: exec.TableScanOperator (Operator.java:initialize(394)) - Initialization Done 0 TS
2017-05-29 11:19:27,582 INFO  [main]: mr.MapredLocalTask (MapredLocalTask.java:initializeOperators(461)) - fetchoperator for t2 initialized
2017-05-29 11:19:28,059 INFO  [main]: Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1174)) - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
2017-05-29 11:19:28,062 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(118)) - <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
2017-05-29 11:19:28,098 INFO  [main]: orc.OrcInputFormat (OrcInputFormat.java:generateSplitsInfo(961)) - FooterCacheHitRatio: 0/4
2017-05-29 11:19:28,098 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(158)) - </PERFLOG method=OrcGetSplits start=1496056768062 end=1496056768098 duration=36 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
2017-05-29 11:19:28,209 INFO  [main]: orc.OrcRawRecordMerger (OrcRawRecordMerger.java:<init>(430)) - min key = null, max key = null
2017-05-29 11:19:28,209 INFO  [main]: orc.ReaderImpl (ReaderImpl.java:rowsOptions(526)) - Reading ORC rows from hdfs://nameservice1/user/hive/warehouse/alex_tmp.db/test_old/000000_0 with {include: [true, true, true, true], offset: 0, length: 9223372036854775807}
2017-05-29 11:19:28,646 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28   Processing rows:    200000  Hashtable size: 199999  Memory usage:   130784248   percentage: 0.069
2017-05-29 11:19:28,708 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28   Processing rows:    300000  Hashtable size: 299999  Memory usage:   159462144   percentage: 0.084
2017-05-29 11:19:28,784 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28   Processing rows:    400000  Hashtable size: 399999  Memory usage:   207258624   percentage: 0.109
2017-05-29 11:19:28,843 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28   Processing rows:    500000  Hashtable size: 499999  Memory usage:   235936520   percentage: 0.124
2017-05-29 11:19:28,903 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28   Processing rows:    600000  Hashtable size: 599999  Memory usage:   274173712   percentage: 0.144
2017-05-29 11:19:28,965 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28   Processing rows:    700000  Hashtable size: 699999  Memory usage:   312410896   percentage: 0.164
2017-05-29 11:19:29,059 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29   Processing rows:    800000  Hashtable size: 799999  Memory usage:   359036720   percentage: 0.188
2017-05-29 11:19:29,126 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29   Processing rows:    900000  Hashtable size: 899999  Memory usage:   397273912   percentage: 0.208
2017-05-29 11:19:29,196 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29   Processing rows:    1000000 Hashtable size: 999999  Memory usage:   425951800   percentage: 0.223
2017-05-29 11:19:29,263 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29   Processing rows:    1100000 Hashtable size: 1099999 Memory usage:   464188992   percentage: 0.243
2017-05-29 11:19:29,333 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29   Processing rows:    1200000 Hashtable size: 1199999 Memory usage:   502426176   percentage: 0.263
2017-05-29 11:19:29,401 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29   Processing rows:    1300000 Hashtable size: 1299999 Memory usage:   540663360   percentage: 0.283
2017-05-29 11:19:32,752 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:32   Processing rows:    1400000 Hashtable size: 1399999 Memory usage:   485809696   percentage: 0.254
2017-05-29 11:19:32,817 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:32   Processing rows:    1500000 Hashtable size: 1499999 Memory usage:   524582216   percentage: 0.275
2017-05-29 11:19:32,937 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:32   Processing rows:    1600000 Hashtable size: 1599999 Memory usage:   580131976   percentage: 0.304
2017-05-29 11:19:32,998 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:32   Processing rows:    1700000 Hashtable size: 1699999 Memory usage:   618904496   percentage: 0.324
2017-05-29 11:19:33,061 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33   Processing rows:    1800000 Hashtable size: 1799999 Memory usage:   647983888   percentage: 0.339
2017-05-29 11:19:33,124 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33   Processing rows:    1900000 Hashtable size: 1899999 Memory usage:   686756400   percentage: 0.36
2017-05-29 11:19:33,188 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33   Processing rows:    2000000 Hashtable size: 1999999 Memory usage:   725528920   percentage: 0.38
2017-05-29 11:19:33,253 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33   Processing rows:    2100000 Hashtable size: 2099999 Memory usage:   764301440   percentage: 0.40
2017-05-29 11:19:33,316 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33   Processing rows:    2200000 Hashtable size: 2199999 Memory usage:   793380824   percentage: 0.416
2017-05-29 11:19:33,380 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33   Processing rows:    2300000 Hashtable size: 2299999 Memory usage:   832153336   percentage: 0.436
2017-05-29 11:19:33,445 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33   Processing rows:    2400000 Hashtable size: 2399999 Memory usage:   870925856   percentage: 0.456
2017-05-29 11:19:33,510 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33   Processing rows:    2500000 Hashtable size: 2499999 Memory usage:   909698376   percentage: 0.477
2017-05-29 11:19:33,574 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33   Processing rows:    2600000 Hashtable size: 2599999 Memory usage:   938777776   percentage: 0.492
2017-05-29 11:19:38,930 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:38   Processing rows:    2700000 Hashtable size: 2699999 Memory usage:   924140056   percentage: 0.484
2017-05-29 11:19:38,996 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:38   Processing rows:    2800000 Hashtable size: 2799999 Memory usage:   960610440   percentage: 0.503
2017-05-29 11:19:39,063 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39   Processing rows:    2900000 Hashtable size: 2899999 Memory usage:   997080808   percentage: 0.522
2017-05-29 11:19:39,134 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39   Processing rows:    3000000 Hashtable size: 2999999 Memory usage:   1033551200  percentage: 0.541
2017-05-29 11:19:39,203 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39   Processing rows:    3100000 Hashtable size: 3099999 Memory usage:   1070021576  percentage: 0.561
2017-05-29 11:19:39,392 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39   Processing rows:    3200000 Hashtable size: 3199999 Memory usage:   1140046400  percentage: 0.597
2017-05-29 11:19:39,456 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39   Processing rows:    3300000 Hashtable size: 3299999 Memory usage:   1176516784  percentage: 0.616
2017-05-29 11:19:39,519 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39   Processing rows:    3400000 Hashtable size: 3399999 Memory usage:   1212987168  percentage: 0.635
2017-05-29 11:19:39,583 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39   Processing rows:    3500000 Hashtable size: 3499999 Memory usage:   1249457552  percentage: 0.655
2017-05-29 11:19:39,646 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39   Processing rows:    3600000 Hashtable size: 3599999 Memory usage:   1285927936  percentage: 0.674
2017-05-29 11:19:39,710 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39   Processing rows:    3700000 Hashtable size: 3699999 Memory usage:   1322398320  percentage: 0.693
2017-05-29 11:19:39,774 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39   Processing rows:    3800000 Hashtable size: 3799999 Memory usage:   1358868704  percentage: 0.712
2017-05-29 11:19:39,837 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39   Processing rows:    3900000 Hashtable size: 3899999 Memory usage:   1395339088  percentage: 0.731
2017-05-29 11:19:39,904 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39   Processing rows:    4000000 Hashtable size: 3999999 Memory usage:   1431809456  percentage: 0.75
2017-05-29 11:19:39,973 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39   Processing rows:    4100000 Hashtable size: 4099999 Memory usage:   1468279832  percentage: 0.769
2017-05-29 11:19:40,041 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:40   Processing rows:    4200000 Hashtable size: 4199999 Memory usage:   1504750200  percentage: 0.788
2017-05-29 11:19:40,113 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:40   Processing rows:    4300000 Hashtable size: 4299999 Memory usage:   1538933512  percentage: 0.806
2017-05-29 11:19:48,786 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:48   Processing rows:    4400000 Hashtable size: 4399999 Memory usage:   1496365384  percentage: 0.784
2017-05-29 11:19:48,850 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:48   Processing rows:    4500000 Hashtable size: 4499999 Memory usage:   1532580448  percentage: 0.803
2017-05-29 11:19:48,915 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:48   Processing rows:    4600000 Hashtable size: 4599999 Memory usage:   1568795512  percentage: 0.822
2017-05-29 11:19:48,979 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:48   Processing rows:    4700000 Hashtable size: 4699999 Memory usage:   1605010584  percentage: 0.841
2017-05-29 11:19:49,044 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:49   Processing rows:    4800000 Hashtable size: 4799999 Memory usage:   1641225648  percentage: 0.86
2017-05-29 11:19:49,108 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:49   Processing rows:    4900000 Hashtable size: 4899999 Memory usage:   1677440712  percentage: 0.879
2017-05-29 11:19:49,171 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:49   Processing rows:    5000000 Hashtable size: 4999999 Memory usage:   1713655784  percentage: 0.898
2017-05-29 11:19:49,235 INFO  [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:49   Processing rows:    5100000 Hashtable size: 5099999 Memory usage:   1749870856  percentage: 0.917
2017-05-29 11:19:49,246 ERROR [main]: mr.MapredLocalTask (MapredLocalTask.java:executeInProcess(354)) - Hive Runtime Error: Map local work exhausted memory
org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 2017-05-29 11:19:49    Processing rows:    5100000 Hashtable size: 5099999 Memory usage:   1749870856  percentage: 0.917
    at org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:99)
    at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.processOp(HashTableSinkOperator.java:249)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
    at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:409)
    at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:380)
    at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInProcess(MapredLocalTask.java:346)
    at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:743)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

I have tried to set the map memory and reduce memory to 22000 also and still no luck. After searching the internet I found someone who suggested to set hive.auto.convert.join = false property in hive to overcome the above error and my query started to run.

I am not sure running my query in this way would gain any performance. Would the performance be still the same? Do we have any other alternative to fix the problem? Please suggest me some ideas on improving the performance of the query.

Upvotes: 1

Views: 2118

Answers (1)

pedram bashiri
pedram bashiri

Reputation: 1376

Your first and safest option is to set hive.auto.convert.join = false. This way you compromise some performance because you won't benefit from mapjoin. But it completely depends on your use case and your data size how big of deal this compromise would be. The other option is to play with hive.auto.convert.join.noconditionaltask.size option which according to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization "enables the user to control what size table can fit in memory" finding the right threshold could be challenging though.

P.S. Just keep in mind for hive.auto.convert.join.noconditionaltask.size to go in effect, hive.auto.convert.join.noconditionaltask needs to be true (which by default is).

Upvotes: 2

Related Questions