scarcer
scarcer

Reputation: 233

Number type value in hbase not recognized by hive

I have a hive/hbase integration table, defined like below.

create table user_c(user_id int, c_name string, c_kind string, c_industry string,
c_jobtitle string, c_workyear int, c_title string, c_company string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:c_name,cf1:c_kind,cf1:c_industry,cf1:c_jobtitle,cf1:c_workyear,cf1:c_title,cf1:c_company")
TBLPROPERTIES ("hbase.table.name" = "user_c");

In my java code, I create a Put and fill it with values read from db. The code looks like this:

final Put to = new Put(getByte(from, keyColumn));
for (final IColumn column : table.getColumns()) {
    if (column.equals(keyColumn)) continue;
    to.add(Bytes.toBytes(column.getColumnFamily()), Bytes.toBytes(column.getDestName()), getByte(from, column));
}
return to;

The getByte is a method to transform value to byte[]. It looks like

byte[] getByte(final Map<String, Object> map, IColumn column) {
    final Object val = map.get(column.getName());
    if (val instanceof Integer) {
        return Bytes.toBytes((Integer) val);
    }
    ...
}

Then put it into hbase.

I can scan the record from hbase shell.

hbase(main):001:0> scan 'user_c'
ROW                                COLUMN+CELL                                                                                      
\x00\x0A\x07\x0D                  column=cf1:c_workyear, timestamp=1350298280554, value=\x00\x00\x07\xD8                         
\x00\x0A\x07\x0D                  column=cf1:c_industry, timestamp=1350298280554, value=120
...

Row key is a Integer type, which should be automatically unbox to primitive int type when processed by the getByte method. Not only the row key, but also other number type column(cf1:c_workyear) is showed like \x00\x0A\x07\x0D, a byte array.

Meantime the String type column(cf1:c_industry) is showed just the value it is.

Is it this alright?

And when I query the record from hive, it just give me a NULL instead of the value of the number type column.

hive> select c_industry, c_workyear from user_c limit 1;
Total MapReduce CPU Time Spent: 10 seconds 370 msec
OK
120     NULL
Time taken: 46.063 seconds

It seems that the c_workyear value can't be recognized by hive. I guess it is because that type is not correct. But shouldn't a int byte array be stored as a int value, rather than a byte array?

Dose anyone know how to fix this?

Thanks a lot.

Upvotes: 3

Views: 3283

Answers (2)

Prac
Prac

Reputation: 11

WE faced the same issue, and it was resolved using #b in column mapping parameters - ("hbase.columns.mapping" = ":key,C1:Name,C1:marks#b")

Column "marks" was stored as bytearray which is actual long type.

@scarcer, Storing all fields in string type will not be an efficient solution.

Upvotes: 0

questionersam
questionersam

Reputation: 1125

Try this in your table definition

"hbase.columns.mapping" = ":key,cf1:c_name,cf1:c_kind,cf1:c_industry#b,cf1:c_jobtitle,cf1:c_workyear#b,cf1:c_title,cf1:c_company"

Notice the use of #b after the binary fields . We have been using this successfully for quite sometime now

Upvotes: 5

Related Questions