Reputation: 13046
I need to index my data stored in HBase rows. Obvious solution is to use Lily HBase indexer through replication and push results into SOLR collection.
The root of my problem is I have some 'short binary' columns in my HBase rows like MD5, CRC64, UUID and alike. Of course I store them as raw byte[]
representation which saves me lot of space. But I need to index data based on some of such criteria storing actual representation. How to do so in correct way?
BinaryField
as appropriate SOLR field type. But it requires HBase column content to be Base64 encoded and Lily HBase indexer doesn't look like solution to support this.bigDecimal
. Is it applicable in this case? As I understand string
itself is not an option.extractHBaseCells
command from Cloudera and type byte[]
which is promised to be just transparent pipe. But what should I do with extracted column data to receive SOLR binary field?byte[]
as sequence for 2-digit hex numbers but is there some good way to map in such way?Upvotes: 2
Views: 1491
Reputation: 13046
Received working solution:
row
mapping type. The result is document ID (unique key) being HBase row key.extractHBaseCells
command from Cloudera Search. Mapping with type byte[]
is used which happened to produce exactly Base64 encoded fields.UPDATE 1:
HBASE_INDEXER_CLASSPATH
environment configuration for HBase indexer and additional class extending com.ngdata.hbaseindexer.uniquekey.BaseUniqueKeyFormatter
which now performs Base64 encoding for unique key so it can be declared as BinaryField
. This finally did ALL things I demand from indexer. So now SOLR receives correct 'update' requests with Base64-encoded 'id' field and fields mapped from other needed columns.UPDATE 2:
solr.BinaryField
I came to just plain solr.StrField
for everything that I need to index AS IS. In case of binary bytes strings like hashes they are transformed into sequence of lowercase hex digits, 2 digits per byte. Maybe not the best in term of performance but looks most portable and flexible. For 'just stored' fields I already have Base64 encoder but I don't fields in SOLR if I don't index them.Upvotes: 3