Reputation: 353
I am trying to update a table by adding a german string by doing the following:
put'table:data_validation_test','58e1f4200f23e474ca2d7f3a','urlbody:data','Auslöser'
What I get on scanning this table is this:
scan 'table:data_validation_test'
ROW COLUMN+CELL
58e1f4200f23e474ca2d7f3a column=urlbody:data, timestamp=1491215905923, value=Ausl\xC3\xB6ser
58e1f4200f23e474ca2d7f3a column=urlbody:id, timestamp=1491215697534, value=58e1f4200f23e474ca2d7f3a
I can't find a way to set encoding strings in hbase. How can I get the string as it is into Hbase?
Upvotes: 1
Views: 1335
Reputation: 11277
This is just an output issue of the scan
command (the same happens with get
). In fact, your string is correctly stored.
This happens here because ö (\xC3\xB6
) is encoded on 2 bytes, and \xC3
and \xB6
cannot be displayed as readable characters. Remember that in HBase, the main type is Array[Byte]
.
If you try to get your string value using JRuby (inside HBase shell) :
include Java
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.client.Get
import org.apache.hadoop.hbase.util.Bytes
config = HBaseConfiguration.create
htable = HTable.new(conf, 'table:data_validation_test')
result = htable.get(Get.new('58e1f4200f23e474ca2d7f3a'.to_java_bytes))
puts Bytes.toString(result.getValue('urlbody'.to_java_bytes, 'data'.to_java_bytes))
Then, your value should be displayed properly.
Upvotes: 1