Reputation: 146
I have a similar problem like this one
The followning are what I used:
And I flow Protocol Buffer java tutorial create my data "testbook".
And I
use hdfs dfs -mkdir /protobuf_data
to create HDFS folder.
Use hdfs dfs -put testbook /protobuf_data
to put "testbook" to HDFS.
Then I follow elephant-bird web page to create table, syntax is like this:
create table addressbook
row format serde "com.twitter.elephantbird.hive.serde.ProtobufDeserializer"
with serdeproperties (
"serialization.class"="com.example.tutorial.AddressBookProtos$AddressBook")
stored as
inputformat "com.twitter.elephantbird.mapred.input.DeprecatedRawMultiInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
LOCATION '/protobuf_data/';
All worked.
But when I submit the query select * from addressbook;
no result came out.
And I couldn't find any logs with errors to debug.
Could someone help me ?
Many thanks
Upvotes: 2
Views: 3573
Reputation: 146
The problem had been solved.
First I put protobuf binary data directly into HDFS, no result showed.
Because it doesn't work that way.
After asking some senior colleagues, they said protobuf binary data should be written into some kind of container, some file format, like hadoop SequenceFile etc.
The elephant-bird page had written the information too, but first I couldn't understand it completely.
After writing protobuf binary data into sequenceFile, I can read the protobuf data with hive.
And because I use sequenceFile format, so I use the create table syntax:
inputformat 'org.apache.hadoop.mapred.SequenceFileInputFormat'
outputformat 'org.apache.hadoop.mapred.SequenceFileOutputFormat'
Hope it can help others who are new to hadoop, hive, elephant too.
Upvotes: 4