Reputation: 1
I have a small Hadoop cluster with a total of 10 nodes, each equipped with 96 cores, 1 TB of RAM, and 32 TB of SSD storage. I'm having trouble enabling Hive LLAP due to the lack of comprehensive documentation. I’ve experimented with various parameter settings and different combinations to run LLAP daemons, but none of them have worked. Here are my cluster configurations:
# hive-site.xml
<property>
<name>hive.llap.execution.mode</name>
<value>all</value>
</property>
<property>
<name>hive.execution.mode</name>
<value>llap</value>
</property>
<property>
<name>hive.llap.daemon.yarn.container.mb</name>
<value>419840</value>
</property>
<property>
<name>hive.llap.io.memory.size</name>
<value>41984</value>
</property>
<property>
<name>hive.llap.daemon.service.hosts</name>
<value>@llap0</value>
</property>
<property>
<name>hive.llap.daemon.num.executors</name>
<value>8</value>
</property>
<property>
<name>hive.llap.zk.registry.user</name>
<value>llap</value>
</property>
<property>
<name>hive.llap.daemon.queue.name</name>
<value>llap</value>
</property>
# fair-scheduler.xml
<?xml version="1.0"?>
<allocations>
<pool name="hadoop">
<minResources>8192mb,1vcores</minResources>
<maxResources>8396800mb,960vcores</maxResources>
<maxRunningApps>1000</maxRunningApps>
<weight>1.0</weight>
<fairSharePreemptionThreshold>0.7</fairSharePreemptionThreshold>
<fairSharePreemptionTimeout>5</fairSharePreemptionTimeout>
</pool>
<pool name="llap">
<minResources>8192mb,1vcores</minResources>
<maxResources>4198400mb,480vcores</maxResources>
<maxRunningApps>1000</maxRunningApps>
<weight>2.0</weight>
<fairSharePreemptionThreshold>0.5</fairSharePreemptionThreshold>
<fairSharePreemptionTimeout>5</fairSharePreemptionTimeout>
</pool>
</allocations>
To start LLAP daemons I run below command:
hive --service llap --name llap0 --instances 1 --loglevel debug --cache 41984m --executors 8 --iothreads 2 --size 419840m --xmx 41984m --startImmediately --javaHome $JAVA_HOME
The following command creates an application as a YARN service with the --startImmediately parameter. It appears that the application starts successfully, but when I submit a query, I receive the error shown below:
killed/failed due to:INIT_FAILURE, Fail to create InputInitializerManager, org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:86)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:155)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:151)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:151)
at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:123)
at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:4315)
at org.apache.tez.dag.app.dag.impl.VertexImpl.access$3200(VertexImpl.java:216)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:3089)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3036)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3018)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:2079)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:215)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2245)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2231)
at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:195)
at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:65)
... 25 more
Caused by: java.lang.IllegalArgumentException: No running LLAP daemons! Please check LLAP service status and zookeeper configuration
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
at org.apache.hadoop.hive.ql.exec.tez.Utils.getSplitLocationProvider(Utils.java:57)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.(HiveSplitGenerator.java:140)
... 30 more
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1724140529947_0103_2_01, diagnostics=
» Vertex received Kill in NEW state., Vertex vertex_1724140529947_0103_2_01
» Reducer 2
killed/failed due to:OTHER_VERTEX_FAILURE
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
Additionally, the command creates a folder containing a run.sh script. However, when I attempt to run it, I encounter the following error:
ERROR client.ApiServiceClient: Fail to launch application:
java.io.IOException:
at org.apache.hadoop.yarn.service.client.ApiServiceClient.getRMWebAddress(ApiServiceClient.java:153)
at org.apache.hadoop.yarn.service.client.ApiServiceClient.getServicePath(ApiServiceClient.java:171)
at org.apache.hadoop.yarn.service.client.ApiServiceClient.getApiClient(ApiServiceClient.java:235)
at org.apache.hadoop.yarn.service.client.ApiServiceClient.actionLaunch(ApiServiceClient.java:380)
at org.apache.hadoop.yarn.client.cli.ApplicationCLI.executeLaunchCommand(ApplicationCLI.java:1265)
at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:198)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97)
at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:128)
In the application logs I got below error:
ERROR [main-EventThread ()] org.apache.curator.framework.imps.EnsembleTracker: Invalid config event received: {server.1=xxx.xxx.xxx.xxx:2888:3888:participant, version=0, server.3=xxx.xxx.xxx.xxx:2888:3888:participant, server.2=xxx.xxx.xxx.xxx:2888:3888:participant}
INFO [main-EventThread ()] org.apache.curator.framework.imps.EnsembleTracker: New config event received: {server.1=xxx.xxx.xxx.xxx:2888:3888:participant, version=0, server.3=xxx.xxx.xxx.xxx:2888:3888:participant, server.2=xxx.xxx.xxx.xxx:2888:3888:participant}
ERROR [main-EventThread ()] org.apache.curator.framework.imps.EnsembleTracker: Invalid config event received: {server.1=xxx.xxx.xxx.xxx:2888:3888:participant, version=0, server.3=xxx.xxx.xxx.xxx:2888:3888:participant, server.2=xxx.xxx.xxx.xxx:2888:3888:participant}
2024-08-19T14:53:54,675 ERROR [main-EventThread ()] org.apache.zookeeper.ClientCnxn: Unexpected throwable
java.lang.NoClassDefFoundError: org/apache/zookeeper/proto/GetAllChildrenNumberResponse
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:654) ~[zookeeper-3.7.1.jar:3.7.1]
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:553) ~[zookeeper-3.7.1.jar:3.7.1]
Caused by: java.lang.ClassNotFoundException: org.apache.zookeeper.proto.GetAllChildrenNumberResponse
at java.net.URLClassLoader.findClass(URLClassLoader.java:387) ~[?:1.8.0_371]
at java.lang.ClassLoader.loadClass(ClassLoader.java:436) ~[?:1.8.0_371]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355) ~[?:1.8.0_371]
at java.lang.ClassLoader.loadClass(ClassLoader.java:369) ~[?:1.8.0_371]
... 2 more
Can someone help me identify the main issue and how to resolve it? Any detailed instructions on how to properly enable Hive with LLAP would be greatly appreciated.
Upvotes: 0
Views: 58