Ravi Ranjan
Ravi Ranjan

Reputation: 353

putting german text in hbase table

I am trying to update a table by adding a german string by doing the following: put'table:data_validation_test','58e1f4200f23e474ca2d7f3a','urlbody:data','Auslöser' What I get on scanning this table is this:

scan 'table:data_validation_test'
ROW                                  COLUMN+CELL                                                                                               
 58e1f4200f23e474ca2d7f3a            column=urlbody:data, timestamp=1491215905923, value=Ausl\xC3\xB6ser                                       
 58e1f4200f23e474ca2d7f3a            column=urlbody:id, timestamp=1491215697534, value=58e1f4200f23e474ca2d7f3a

I can't find a way to set encoding strings in hbase. How can I get the string as it is into Hbase?

Upvotes: 1

Views: 1335

Answers (1)

norbjd
norbjd

Reputation: 11277

This is just an output issue of the scan command (the same happens with get). In fact, your string is correctly stored.

This happens here because ö (\xC3\xB6) is encoded on 2 bytes, and \xC3 and \xB6 cannot be displayed as readable characters. Remember that in HBase, the main type is Array[Byte].

If you try to get your string value using JRuby (inside HBase shell) :

include Java
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.client.Get
import org.apache.hadoop.hbase.util.Bytes

config = HBaseConfiguration.create
htable = HTable.new(conf, 'table:data_validation_test')
result = htable.get(Get.new('58e1f4200f23e474ca2d7f3a'.to_java_bytes))

puts Bytes.toString(result.getValue('urlbody'.to_java_bytes, 'data'.to_java_bytes))

Then, your value should be displayed properly.

Upvotes: 1

Related Questions