Obie Du
Obie Du

Reputation: 41

AvroRuntimeException occurs when executing some hql in hive

I was doing the Hadoop(2.6.0) twitter example by Flume(1.5.2) and Hive(0.14.0). I got data from twitter successfully via Flume and stored them to the my own hdfs.

But when I wanted to use hive to deal with these data to do some analyzing (only select one field from a table), the "Failed with exception java.io.IOException:org.apache.avro.AvroRuntimeException: java.io.EOFException" exception happened and little useful information I could find related to this exception.

Actuall I can fetch most records of a file successfully (like the information below, I fetched 5100 rows successfully) but it would fail in the end. As a result I cannot deal with all the tweets files together.

Time taken: 1.512 seconds, Fetched: 5100 row(s)   
Failed with exception java.io.IOException:org.apache.avro.AvroRuntimeException: java.io.EOFException
    15/04/15 19:59:18 [main]: ERROR CliDriver: Failed with exception java.io.IOException:org.apache.avro.AvroRuntimeException: java.io.EOFException
    java.io.IOException: org.apache.avro.AvroRuntimeException: java.io.EOFException
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:663)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:561)
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1621)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:267)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    Caused by: org.apache.avro.AvroRuntimeException: java.io.EOFException
        at org.apache.avro.file.DataFileStream.next(DataFileStream.java:222)
        at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:153)
        at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:52)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:629)
        ... 15 more
    Caused by: java.io.EOFException
        at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
        at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
        at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:259)
        at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
        at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:348)
        at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:341)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)
        at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
        at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
        at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
        ... 18 more

I use the hql below to create a table

CREATE TABLE tweets
  ROW FORMAT SERDE
     'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT
     'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
     'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES ('avro.schema.url'='file:///home/hduser/hive-0.14.0-bin/tweetsdoc_new.avsc');

then load tweets file from hdfs

LOAD DATA INPATH '/user/flume/tweets/FlumeData.1429098355304' OVERWRITE INTO TABLE tweets;

Could anyone tell me the possible reason, or an effective way to find more details of the exception?

Upvotes: 1

Views: 1498

Answers (1)

Renat Bekbolatov
Renat Bekbolatov

Reputation: 329

I had this annoying problem as well.

I looked at the produced binary file and debugged Avro deserialization of bits.

The reason for this EOFException was that Flume inserts new line character byte after every event (you can notice 0x0A after every record).

Avro deserializer thinks the file hasn't finished and interprets that character as some number of blocks to read, but then can't read out that number of blocks without hitting EOF.

Upvotes: 0

Related Questions