Reputation: 233
I have a hive/hbase integration table, defined like below.
create table user_c(user_id int, c_name string, c_kind string, c_industry string,
c_jobtitle string, c_workyear int, c_title string, c_company string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:c_name,cf1:c_kind,cf1:c_industry,cf1:c_jobtitle,cf1:c_workyear,cf1:c_title,cf1:c_company")
TBLPROPERTIES ("hbase.table.name" = "user_c");
In my java code, I create a Put
and fill it with values read from db. The code looks like this:
final Put to = new Put(getByte(from, keyColumn));
for (final IColumn column : table.getColumns()) {
if (column.equals(keyColumn)) continue;
to.add(Bytes.toBytes(column.getColumnFamily()), Bytes.toBytes(column.getDestName()), getByte(from, column));
}
return to;
The getByte
is a method to transform value to byte[]
.
It looks like
byte[] getByte(final Map<String, Object> map, IColumn column) {
final Object val = map.get(column.getName());
if (val instanceof Integer) {
return Bytes.toBytes((Integer) val);
}
...
}
Then put it into hbase.
I can scan the record from hbase shell.
hbase(main):001:0> scan 'user_c'
ROW COLUMN+CELL
\x00\x0A\x07\x0D column=cf1:c_workyear, timestamp=1350298280554, value=\x00\x00\x07\xD8
\x00\x0A\x07\x0D column=cf1:c_industry, timestamp=1350298280554, value=120
...
Row key is a Integer
type, which should be automatically unbox to primitive int
type when processed by the getByte
method. Not only the row key, but also other number type column(cf1:c_workyear) is showed like \x00\x0A\x07\x0D
, a byte array.
Meantime the String
type column(cf1:c_industry) is showed just the value it is.
Is it this alright?
And when I query the record from hive, it just give me a NULL
instead of the value of the number type column.
hive> select c_industry, c_workyear from user_c limit 1;
Total MapReduce CPU Time Spent: 10 seconds 370 msec
OK
120 NULL
Time taken: 46.063 seconds
It seems that the c_workyear value can't be recognized by hive. I guess it is because that type is not correct. But shouldn't a int
byte array be stored as a int
value, rather than a byte array?
Dose anyone know how to fix this?
Thanks a lot.
Upvotes: 3
Views: 3283
Reputation: 11
WE faced the same issue, and it was resolved using #b in column mapping parameters - ("hbase.columns.mapping" = ":key,C1:Name,C1:marks#b")
Column "marks" was stored as bytearray which is actual long type.
@scarcer, Storing all fields in string type will not be an efficient solution.
Upvotes: 0
Reputation: 1125
Try this in your table definition
"hbase.columns.mapping" = ":key,cf1:c_name,cf1:c_kind,cf1:c_industry#b,cf1:c_jobtitle,cf1:c_workyear#b,cf1:c_title,cf1:c_company"
Notice the use of #b
after the binary fields . We have been using this successfully for quite sometime now
Upvotes: 5