Reputation: 41
I was doing the Hadoop(2.6.0) twitter example by Flume(1.5.2) and Hive(0.14.0). I got data from twitter successfully via Flume and stored them to the my own hdfs.
But when I wanted to use hive to deal with these data to do some analyzing (only select one field from a table), the "Failed with exception java.io.IOException:org.apache.avro.AvroRuntimeException: java.io.EOFException" exception happened and little useful information I could find related to this exception.
Actuall I can fetch most records of a file successfully (like the information below, I fetched 5100 rows successfully) but it would fail in the end. As a result I cannot deal with all the tweets files together.
Time taken: 1.512 seconds, Fetched: 5100 row(s)
Failed with exception java.io.IOException:org.apache.avro.AvroRuntimeException: java.io.EOFException
15/04/15 19:59:18 [main]: ERROR CliDriver: Failed with exception java.io.IOException:org.apache.avro.AvroRuntimeException: java.io.EOFException
java.io.IOException: org.apache.avro.AvroRuntimeException: java.io.EOFException
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:663)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:561)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1621)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:267)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.avro.AvroRuntimeException: java.io.EOFException
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:222)
at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:153)
at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:52)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:629)
... 15 more
Caused by: java.io.EOFException
at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:259)
at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:348)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:341)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
... 18 more
I use the hql below to create a table
CREATE TABLE tweets
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.url'='file:///home/hduser/hive-0.14.0-bin/tweetsdoc_new.avsc');
then load tweets file from hdfs
LOAD DATA INPATH '/user/flume/tweets/FlumeData.1429098355304' OVERWRITE INTO TABLE tweets;
Could anyone tell me the possible reason, or an effective way to find more details of the exception?
Upvotes: 1
Views: 1498
Reputation: 329
I had this annoying problem as well.
I looked at the produced binary file and debugged Avro deserialization of bits.
The reason for this EOFException was that Flume inserts new line character byte after every event (you can notice 0x0A after every record).
Avro deserializer thinks the file hasn't finished and interprets that character as some number of blocks to read, but then can't read out that number of blocks without hitting EOF.
Upvotes: 0