Reputation: 20319
I have a table in HBase
that I'd like to represent as an EXTERNAL TABLE
in hive
So far I've been using:
CREATE EXTERNAL TABLE events(key STRING, day INT, source STRING, ip STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,c:date#b,c:source,c:ipAddress")
TBLPROPERTIES ("hbase.table.name" = "eventTable");
However my queries aren't balanced properly across my mappers, so I'm trying to partition on ip address:
CREATE EXTERNAL TABLE events(key STRING, source STRING)
PARTITIONED BY (ip STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,c:date#b,c:source,c:ipAddress")
TBLPROPERTIES ("hbase.table.name" = "eventTable");
But I receive an error about improper column mappings:
FAILED: Error in metadata: java.lang.RuntimeException:
MetaException(message:org.apache.hadoop.hive.serde2.SerDeException
org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 2 elements while hbase.columns.mapping has 3 elements (counting the key if implicit))
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
I've been looking around but I can't find any documentation that indicates how to map between an hbase column and a hive partitioning column
Upvotes: 1
Views: 2852
Reputation: 1401
I think You can't partition external table that easily, especially when underlying storage is HBase.
Hive partition strategy is build on that way that data from specific partition is stored in separate folder ("or any other storage"). Because of that partitioning with HBase (if it exists) would require usage of more tables or usage of HBase versions.
I think this post will give you better understanding of partitioning http://blog.zhengdong.me/2012/02/22/hive-external-table-with-partitions
And on this place https://cwiki.apache.org/Hive/hbaseintegration.html you can find that partitioning in HBase is left for future.
If you want to have partitions I would recommend loading data from HBase/Hive to HDFS/Hive table, but that also depends on your use cases.
Regards, Dino
Upvotes: 1