Reputation: 8163
I want to map my HBase table to Hive, this is what i got:
CREATE EXTERNAL TABLE kutschke.bda01.twitter (
rowkey BIGINT,
userId BIGINT,
text STRING,
creationTime STRING,
isRetweet BOOLEAN,
retweetId BIGINT
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key, user:id, text:, time:createdAt, retweet:isRetweet, retweet:retweetId'
TBLPROPERTIES('hbase.table.name' = 'kutschke.bda01.twitter'
However, the 'text:' column doesn't get properly mapped because it has no qualifier. Instead i get the exception:
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException:
MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe:
hbase column family 'text' should be mapped to Map<? extends LazyPrimitive<?, ?>,?>,
that is the Key for the map should be of primitive type, but is mapped to string)
I think i understand the logic behind mapping the whole column family to Map, but is there a way to properly map the column with the empty qualifier? If not, how do i need to go about mapping the column family to a MAP, and how will i retrieve the column i actually want?
Upvotes: 1
Views: 1923
Reputation: 286
This can be done by typing the Hive column as the Hive native map type, like this:
CREATE TABLE hbase_table_1(value map<string,int>, row_key int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = "cf:,:key"
);
The output from the field mapped to a whole CF will be presented as a json string.
More info here : https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-HiveMAPtoHBaseColumnFamily
Upvotes: 0