Denys
Denys

Reputation: 4557

Defining Hive external table on top of HBase existing table

There is an empty HBase table with two column families:

create 'emp', 'personal_data', 'professional_data'

Now I am trying to map a Hive external table to it, which would naturally have some columns:

CREATE EXTERNAL TABLE emp(id int, city string, name string, occupation string, salary int) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":id,
                       personal_data:city,
                       personal_data:name,
                       professional_data:occupation,
                       professional_data:salary")
TBLPROPERTIES ("hbase.table.name" = "emp", "hbase.mapred.output.outputtable" = "emp");

Now the error that I get is this:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 5 elements while hbase.columns.mapping has 6 elements (counting the key if implicit))

Could you please help me out? Am i doing something wrong?

Upvotes: 2

Views: 5347

Answers (1)

cheseaux
cheseaux

Reputation: 5315

In your mapping, you're referencing the id field but you should reference the HBase key keyword. As stated in the documentation :

a mapping entry must be either :key or of the form column-family-name:[column-name][#(binary|string)

Just replace :id by :key and that should do it :

CREATE EXTERNAL TABLE emp(id int, city string, name string, occupation string, salary int) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,
                   personal_data:city,
                   personal_data:name,
                   professional_data:occupation,
                   professional_data:salary")
TBLPROPERTIES ("hbase.table.name" = "emp", "hbase.mapred.output.outputtable" = "emp");

The column mapping is based on the ordering of the columns, not on their names. In the documentation, paragraph Multiple Columns and Families you can clearly see that the names don't matter

CREATE TABLE hbase_table_1(key int, value1 string, value2 int, value3 int) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
  "hbase.columns.mapping" = ":key,a:b,a:c,d:e"
)

The mapping is then

  • key -> id
  • a:b -> value1
  • a:c -> value2
  • d:e -> value3

Upvotes: 5

Related Questions