Reputation: 51

Unable to create external table using hive in aws EMR cluster where location pointing to some S3 location

I am trying to create an external table using hive service of AWS EMR cluster. Here, This external table is pointing to some S3 location. Below is my create table definition :

EXTERNAL TABLE if not exists Myschema.MyTable
(
   columnA INT,
   columnB INT, 
   columnC String, 
)
partitioned BY ( columnD INT )
STORED AS PARQUET
LOCATION 's3://{bucket-locaiton}/{key-path}/';

Below is the Exception I am getting :

2019-04-11T14:44:59,449 INFO  [6a95bad7-18e7-49de-856d-43219b7c5069 main([])]: util.PlatformInfo (PlatformInfo.java:getJobFlowId(54)) - Unable to read clusterId from http://localhost:8321/configuration, trying extra instance data file: /var/lib/instance-controller/extraInstanceData.json
2019-04-11T14:44:59,450 INFO  [6a95bad7-18e7-49de-856d-43219b7c5069 main([])]: util.PlatformInfo (PlatformInfo.java:getJobFlowId(61)) - Unable to read clusterId from /var/lib/instance-controller/extraInstanceData.json, trying EMR job-flow data file: /var/lib/info/job-flow.json
2019-04-11T14:44:59,450 INFO  [6a95bad7-18e7-49de-856d-43219b7c5069 main([])]: util.PlatformInfo (PlatformInfo.java:getJobFlowId(69)) - Unable to read clusterId from /var/lib/info/job-flow.json, out of places to look
2019-04-11T14:45:01,073 INFO  [6a95bad7-18e7-49de-856d-43219b7c5069 main([])]: conf.HiveConf (HiveConf.java:getLogIdVar(3956)) - Using the default value passed in for log id: 6a95bad7-18e7-49de-856d-43219b7c5069
2019-04-11T14:45:01,073 INFO  [6a95bad7-18e7-49de-856d-43219b7c5069 main([])]: session.SessionState (SessionState.java:resetThreadName(432)) - Resetting thread name to  main
2019-04-11T14:45:01,072 ERROR [6a95bad7-18e7-49de-856d-43219b7c5069 main([])]: ql.Driver (SessionState.java:printError(1126)) - FAILED: $ComputationException java.lang.ArrayIndexOutOfBoundsException: 16227
com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$ComputationException: java.lang.ArrayIndexOutOfBoundsException: 16227
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$MapMaker$StrategyImpl.compute(MapMaker.java:553)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$MapMaker$StrategyImpl.compute(MapMaker.java:419)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$CustomConcurrentHashMap$ComputingImpl.get(CustomConcurrentHashMap.java:2041)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$StackTraceElements.forMember(StackTraceElements.java:53)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.Errors.formatSource(Errors.java:690)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.Errors.format(Errors.java:555)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.ProvisionException.getMessage(ProvisionException.java:59)
        at java.lang.Throwable.getLocalizedMessage(Throwable.java:391)
        at java.lang.Throwable.toString(Throwable.java:480)
        at java.lang.Throwable.<init>(Throwable.java:311)
        at java.lang.Exception.<init>(Exception.java:102)
        at org.apache.hadoop.hive.ql.metadata.HiveException.<init>(HiveException.java:41)
        at org.apache.hadoop.hive.ql.parse.SemanticException.<init>(SemanticException.java:41)
        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.toReadEntity(BaseSemanticAnalyzer.java:1659)
        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.toReadEntity(BaseSemanticAnalyzer.java:1651)
        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.toReadEntity(BaseSemanticAnalyzer.java:1647)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:11968)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:11020)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11133)
        at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 16227
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.asm.$ClassReader.readClass(Unknown Source)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.asm.$ClassReader.accept(Unknown Source)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.asm.$ClassReader.accept(Unknown Source)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$LineNumbers.<init>(LineNumbers.java:62)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$StackTraceElements$1.apply(StackTraceElements.java:36)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$StackTraceElements$1.apply(StackTraceElements.java:33)
        at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$MapMaker$StrategyImpl.compute(MapMaker.java:549)
        ... 37 more

Note: same table when i am creating with HDFS location. I am successfully able to create it.

Upvotes: 5

Answers (3)

Xun Ren

Reputation: 611

After debugging into the code of Hadoop and AWS, I found that the java.lang.ArrayIndexOutOfBoundsException has nothing to do with the real error behind.

In fact, EMR/Hadoop has generated another error(depends on your situation), but when it formats this error message, it triggered another exception: java.lang.ArrayIndexOutOfBoundsException. There is an issue related to this: https://github.com/google/guice/issues/757

In order to find the real reason behind, you have some options:

Simulate what you're doing by using a command and enable the debug mode. For example, I have an error while reading/writing data from/to S3 with EMRFS, so I used the command "hdfs dfs -ls s3://xxxxx/xxx" instead. Before this command, I enabled the debug mode with the variable: export HADOOP_ROOT_LOGGER=DEBUG,console It can show some interesting errors
If the first option still doesn't show anything, then you could do as what I did: 2.1 export HADOOP_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005" 2.2 Launch the command "hdfs dfs -ls s3://xxxx/xxx". It will wait for the remote client to connect to the JVM for debugging(I declared suspend=y) 2.3 Use an IDE tool to connect to the JVM. Of course, before that, you need to import or download the related jars into your IDE.

Amazon really need to correct the Google Guice library error by upgrading the version.

Upvotes: 2

maxmithun

Reputation: 1147

Run the hadoop fs -ls s3:// from the master node with the user to see that you are getting the same error

Caused by: java.lang.ArrayIndexOutOfBoundsException: 16227

Check the user has the IAM role with sufficient S3/DynamoDB permission.

Upvotes: 1

Jeremy DeGroot

Reputation: 4506

I'm not sure of the exact problem, but when I encountered this I was able to get it to work by using a newly created S3 bucket. Hive just didn't like something about my older bucket.

Edit: I was actually able to fix this with an existing bucket. My EMR configuration had a mis-specification for fs.s3.maxConnections. When I set this to a valid value and spun up a new cluster, the problem went away.

Upvotes: 0

Unable to create external table using hive in aws EMR cluster where location pointing to some S3 location

Answers (3)

Related Questions