slayton
slayton

Reputation: 20319

What is the syntax to create an external table partitioned on an hbase column?

I have a table in HBase that I'd like to represent as an EXTERNAL TABLE in hive

So far I've been using:

CREATE EXTERNAL TABLE events(key STRING, day INT, source STRING, ip STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,c:date#b,c:source,c:ipAddress")
TBLPROPERTIES ("hbase.table.name" = "eventTable");

However my queries aren't balanced properly across my mappers, so I'm trying to partition on ip address:

CREATE EXTERNAL TABLE events(key STRING, source STRING)
PARTITIONED BY (ip STRING)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,c:date#b,c:source,c:ipAddress")
TBLPROPERTIES ("hbase.table.name" = "eventTable");

But I receive an error about improper column mappings:

FAILED: Error in metadata: java.lang.RuntimeException:   
 MetaException(message:org.apache.hadoop.hive.serde2.SerDeException 
 org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 2 elements while hbase.columns.mapping has 3 elements (counting the key if implicit))
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

I've been looking around but I can't find any documentation that indicates how to map between an hbase column and a hive partitioning column

Upvotes: 1

Views: 2852

Answers (1)

dino.keco
dino.keco

Reputation: 1401

I think You can't partition external table that easily, especially when underlying storage is HBase.

Hive partition strategy is build on that way that data from specific partition is stored in separate folder ("or any other storage"). Because of that partitioning with HBase (if it exists) would require usage of more tables or usage of HBase versions.

I think this post will give you better understanding of partitioning http://blog.zhengdong.me/2012/02/22/hive-external-table-with-partitions

And on this place https://cwiki.apache.org/Hive/hbaseintegration.html you can find that partitioning in HBase is left for future.

If you want to have partitions I would recommend loading data from HBase/Hive to HDFS/Hive table, but that also depends on your use cases.

Regards, Dino

Upvotes: 1

Related Questions