Reputation: 2095
I am trying to join two ORC tables in Hive but I get the an error. Here is the query:
select t1.num as num, t1.product as Product, t2.value as OldValue, t1.value as NewValue from test_new t1 LEFT OUTER JOIN test_old t2 ON t1.num=t2.num and t1.product=t2.product where t2.value is NULL and t1.value is not NULL or t1.value<>t2.value;
Error:
2017-05-29 11:19:27,157 INFO [main]: mr.ExecDriver (SessionState.java:printInfo(911)) - Execution log at: /tmp/alex/kaliamoorthya_20170529111919_6621dd64-7a5e-4411-abda-b28fddab8bdc.log
2017-05-29 11:19:27,320 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(118)) - <PERFLOG method=deserializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
2017-05-29 11:19:27,321 INFO [main]: exec.Utilities (Utilities.java:deserializePlan(953)) - Deserializing MapredLocalWork via kryo
2017-05-29 11:19:27,462 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(158)) - </PERFLOG method=deserializePlan start=1496056767320 end=1496056767462 duration=142 from=org.apache.hadoop.hive.ql.exec.Utilities>
2017-05-29 11:19:27,472 INFO [main]: mr.MapredLocalTask (SessionState.java:printInfo(911)) - 2017-05-29 11:19:27 Starting to launch local task to process map join; maximum memory = 1908932608
2017-05-29 11:19:27,549 INFO [main]: mr.MapredLocalTask (MapredLocalTask.java:initializeOperators(441)) - fetchoperator for t2 created
2017-05-29 11:19:27,550 INFO [main]: exec.TableScanOperator (Operator.java:initialize(346)) - Initializing Self TS[0]
2017-05-29 11:19:27,550 INFO [main]: exec.TableScanOperator (Operator.java:initializeChildren(419)) - Operator 0 TS initialized
2017-05-29 11:19:27,550 INFO [main]: exec.TableScanOperator (Operator.java:initializeChildren(423)) - Initializing children of 0 TS
2017-05-29 11:19:27,550 INFO [main]: exec.HashTableSinkOperator (Operator.java:initialize(458)) - Initializing child 1 HASHTABLESINK
2017-05-29 11:19:27,550 INFO [main]: exec.HashTableSinkOperator (Operator.java:initialize(346)) - Initializing Self HASHTABLESINK[1]
2017-05-29 11:19:27,551 INFO [main]: mapjoin.MapJoinMemoryExhaustionHandler (MapJoinMemoryExhaustionHandler.java:<init>(61)) - JVM Max Heap Size: 1908932608
2017-05-29 11:19:27,582 INFO [main]: persistence.HashMapWrapper (HashMapWrapper.java:calculateTableSize(94)) - Key count from statistics is -1; setting map size to 100000
2017-05-29 11:19:27,582 INFO [main]: exec.HashTableSinkOperator (Operator.java:initialize(394)) - Initialization Done 1 HASHTABLESINK
2017-05-29 11:19:27,582 INFO [main]: exec.TableScanOperator (Operator.java:initialize(394)) - Initialization Done 0 TS
2017-05-29 11:19:27,582 INFO [main]: mr.MapredLocalTask (MapredLocalTask.java:initializeOperators(461)) - fetchoperator for t2 initialized
2017-05-29 11:19:28,059 INFO [main]: Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1174)) - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
2017-05-29 11:19:28,062 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(118)) - <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
2017-05-29 11:19:28,098 INFO [main]: orc.OrcInputFormat (OrcInputFormat.java:generateSplitsInfo(961)) - FooterCacheHitRatio: 0/4
2017-05-29 11:19:28,098 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(158)) - </PERFLOG method=OrcGetSplits start=1496056768062 end=1496056768098 duration=36 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
2017-05-29 11:19:28,209 INFO [main]: orc.OrcRawRecordMerger (OrcRawRecordMerger.java:<init>(430)) - min key = null, max key = null
2017-05-29 11:19:28,209 INFO [main]: orc.ReaderImpl (ReaderImpl.java:rowsOptions(526)) - Reading ORC rows from hdfs://nameservice1/user/hive/warehouse/alex_tmp.db/test_old/000000_0 with {include: [true, true, true, true], offset: 0, length: 9223372036854775807}
2017-05-29 11:19:28,646 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28 Processing rows: 200000 Hashtable size: 199999 Memory usage: 130784248 percentage: 0.069
2017-05-29 11:19:28,708 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28 Processing rows: 300000 Hashtable size: 299999 Memory usage: 159462144 percentage: 0.084
2017-05-29 11:19:28,784 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28 Processing rows: 400000 Hashtable size: 399999 Memory usage: 207258624 percentage: 0.109
2017-05-29 11:19:28,843 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28 Processing rows: 500000 Hashtable size: 499999 Memory usage: 235936520 percentage: 0.124
2017-05-29 11:19:28,903 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28 Processing rows: 600000 Hashtable size: 599999 Memory usage: 274173712 percentage: 0.144
2017-05-29 11:19:28,965 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28 Processing rows: 700000 Hashtable size: 699999 Memory usage: 312410896 percentage: 0.164
2017-05-29 11:19:29,059 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29 Processing rows: 800000 Hashtable size: 799999 Memory usage: 359036720 percentage: 0.188
2017-05-29 11:19:29,126 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29 Processing rows: 900000 Hashtable size: 899999 Memory usage: 397273912 percentage: 0.208
2017-05-29 11:19:29,196 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29 Processing rows: 1000000 Hashtable size: 999999 Memory usage: 425951800 percentage: 0.223
2017-05-29 11:19:29,263 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29 Processing rows: 1100000 Hashtable size: 1099999 Memory usage: 464188992 percentage: 0.243
2017-05-29 11:19:29,333 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29 Processing rows: 1200000 Hashtable size: 1199999 Memory usage: 502426176 percentage: 0.263
2017-05-29 11:19:29,401 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29 Processing rows: 1300000 Hashtable size: 1299999 Memory usage: 540663360 percentage: 0.283
2017-05-29 11:19:32,752 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:32 Processing rows: 1400000 Hashtable size: 1399999 Memory usage: 485809696 percentage: 0.254
2017-05-29 11:19:32,817 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:32 Processing rows: 1500000 Hashtable size: 1499999 Memory usage: 524582216 percentage: 0.275
2017-05-29 11:19:32,937 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:32 Processing rows: 1600000 Hashtable size: 1599999 Memory usage: 580131976 percentage: 0.304
2017-05-29 11:19:32,998 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:32 Processing rows: 1700000 Hashtable size: 1699999 Memory usage: 618904496 percentage: 0.324
2017-05-29 11:19:33,061 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 1800000 Hashtable size: 1799999 Memory usage: 647983888 percentage: 0.339
2017-05-29 11:19:33,124 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 1900000 Hashtable size: 1899999 Memory usage: 686756400 percentage: 0.36
2017-05-29 11:19:33,188 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2000000 Hashtable size: 1999999 Memory usage: 725528920 percentage: 0.38
2017-05-29 11:19:33,253 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2100000 Hashtable size: 2099999 Memory usage: 764301440 percentage: 0.40
2017-05-29 11:19:33,316 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2200000 Hashtable size: 2199999 Memory usage: 793380824 percentage: 0.416
2017-05-29 11:19:33,380 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2300000 Hashtable size: 2299999 Memory usage: 832153336 percentage: 0.436
2017-05-29 11:19:33,445 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2400000 Hashtable size: 2399999 Memory usage: 870925856 percentage: 0.456
2017-05-29 11:19:33,510 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2500000 Hashtable size: 2499999 Memory usage: 909698376 percentage: 0.477
2017-05-29 11:19:33,574 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2600000 Hashtable size: 2599999 Memory usage: 938777776 percentage: 0.492
2017-05-29 11:19:38,930 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:38 Processing rows: 2700000 Hashtable size: 2699999 Memory usage: 924140056 percentage: 0.484
2017-05-29 11:19:38,996 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:38 Processing rows: 2800000 Hashtable size: 2799999 Memory usage: 960610440 percentage: 0.503
2017-05-29 11:19:39,063 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 2900000 Hashtable size: 2899999 Memory usage: 997080808 percentage: 0.522
2017-05-29 11:19:39,134 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3000000 Hashtable size: 2999999 Memory usage: 1033551200 percentage: 0.541
2017-05-29 11:19:39,203 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3100000 Hashtable size: 3099999 Memory usage: 1070021576 percentage: 0.561
2017-05-29 11:19:39,392 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3200000 Hashtable size: 3199999 Memory usage: 1140046400 percentage: 0.597
2017-05-29 11:19:39,456 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3300000 Hashtable size: 3299999 Memory usage: 1176516784 percentage: 0.616
2017-05-29 11:19:39,519 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3400000 Hashtable size: 3399999 Memory usage: 1212987168 percentage: 0.635
2017-05-29 11:19:39,583 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3500000 Hashtable size: 3499999 Memory usage: 1249457552 percentage: 0.655
2017-05-29 11:19:39,646 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3600000 Hashtable size: 3599999 Memory usage: 1285927936 percentage: 0.674
2017-05-29 11:19:39,710 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3700000 Hashtable size: 3699999 Memory usage: 1322398320 percentage: 0.693
2017-05-29 11:19:39,774 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3800000 Hashtable size: 3799999 Memory usage: 1358868704 percentage: 0.712
2017-05-29 11:19:39,837 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3900000 Hashtable size: 3899999 Memory usage: 1395339088 percentage: 0.731
2017-05-29 11:19:39,904 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 4000000 Hashtable size: 3999999 Memory usage: 1431809456 percentage: 0.75
2017-05-29 11:19:39,973 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 4100000 Hashtable size: 4099999 Memory usage: 1468279832 percentage: 0.769
2017-05-29 11:19:40,041 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:40 Processing rows: 4200000 Hashtable size: 4199999 Memory usage: 1504750200 percentage: 0.788
2017-05-29 11:19:40,113 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:40 Processing rows: 4300000 Hashtable size: 4299999 Memory usage: 1538933512 percentage: 0.806
2017-05-29 11:19:48,786 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:48 Processing rows: 4400000 Hashtable size: 4399999 Memory usage: 1496365384 percentage: 0.784
2017-05-29 11:19:48,850 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:48 Processing rows: 4500000 Hashtable size: 4499999 Memory usage: 1532580448 percentage: 0.803
2017-05-29 11:19:48,915 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:48 Processing rows: 4600000 Hashtable size: 4599999 Memory usage: 1568795512 percentage: 0.822
2017-05-29 11:19:48,979 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:48 Processing rows: 4700000 Hashtable size: 4699999 Memory usage: 1605010584 percentage: 0.841
2017-05-29 11:19:49,044 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:49 Processing rows: 4800000 Hashtable size: 4799999 Memory usage: 1641225648 percentage: 0.86
2017-05-29 11:19:49,108 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:49 Processing rows: 4900000 Hashtable size: 4899999 Memory usage: 1677440712 percentage: 0.879
2017-05-29 11:19:49,171 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:49 Processing rows: 5000000 Hashtable size: 4999999 Memory usage: 1713655784 percentage: 0.898
2017-05-29 11:19:49,235 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:49 Processing rows: 5100000 Hashtable size: 5099999 Memory usage: 1749870856 percentage: 0.917
2017-05-29 11:19:49,246 ERROR [main]: mr.MapredLocalTask (MapredLocalTask.java:executeInProcess(354)) - Hive Runtime Error: Map local work exhausted memory
org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 2017-05-29 11:19:49 Processing rows: 5100000 Hashtable size: 5099999 Memory usage: 1749870856 percentage: 0.917
at org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:99)
at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.processOp(HashTableSinkOperator.java:249)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:409)
at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:380)
at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInProcess(MapredLocalTask.java:346)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:743)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
I have tried to set the map memory and reduce memory to 22000 also and still no luck.
After searching the internet I found someone who suggested to set hive.auto.convert.join = false
property in hive to overcome the above error and my query started to run.
I am not sure running my query in this way would gain any performance. Would the performance be still the same? Do we have any other alternative to fix the problem? Please suggest me some ideas on improving the performance of the query.
Upvotes: 1
Views: 2118
Reputation: 1376
Your first and safest option is to set hive.auto.convert.join = false. This way you compromise some performance because you won't benefit from mapjoin. But it completely depends on your use case and your data size how big of deal this compromise would be. The other option is to play with hive.auto.convert.join.noconditionaltask.size option which according to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization "enables the user to control what size table can fit in memory" finding the right threshold could be challenging though.
P.S. Just keep in mind for hive.auto.convert.join.noconditionaltask.size to go in effect, hive.auto.convert.join.noconditionaltask needs to be true (which by default is).
Upvotes: 2