How to find a value among all columns in a row in HBase

Question

I have a table in HBase to store user objects. Each object has 4 columns and I name each column as [object_creation_date]_[column_name] to auto order columns by object creation date.

For example:
RowKey  20140101_a 20140101_b 20140101_c 20140101_d 20140102_a 20140102_b 20140102_c 20140102_d
1        1a          1b         1c          1d         2a          2b        2c        2d

Now I'm trying to filter these values by column. Is there any way to find the object (set of 4 columns) whose property "C" equals "2c"? It should return 20140102

I tried using ColumnRangeFilter class but it only seems to work with prefixes and I´d need a RegExp to find all "C" columns no matter the date they were created.

Is there another way of doing this or maybe I could use a different representation for the data?

Rub&#233;n Moraleda · Accepted Answer

You could always implement your own filter, but why don't you just invert the order of the columns to [column_name]_[object_creation_date]? That way you can use the standard ColumnPrefixFilter which seems more appropiate.

Anyway, I think you should consider moving from a wide to a tall approach:

RowKey      u   a   b   c   d
1_20140101  1   1a  1b  1c  1d
1_20140102  1   2a  2b  2c  2d
1_20140103  1   3a  3b  3c  3d

This will allow you to perform very fast scans for a known user or even a full table scan for a known column.

Or, if you want to avoid having multiple rows for the same user you can use versioning.

Tip: For even better query performance you could even build your own "secondary-index" table with this type of row keys: [column]_[4char_md5_of_value]_[user_id]

RowKey    value   u  d 
a_afaa_1  1a      1  20140101 
a_a32a_1  2a      1  20140102
a_45ae_1  3a      1  20140103
b_l413_1  1b      1  20140101 
b_533a_1  2b      1  20140102
b_8ce3_1  3b      1  20140103
c_b31c_1  1c      1  20140101 
c_2ca1_1  2c      1  20140102
c_a99f_1  3c      1  20140103

This would make looking for any column value ultra fast: For the value you want to search, perform the md5, and get the first 4 chars of the hex string, with that do a scan providing the row prefix [column]_[hash], and add the ColumnValue filter (because there could be multiple values under the same hash).

You can also restrict this table to the column/s you'll be querying to avoid saving data you won't need.

How to find a value among all columns in a row in HBase

Answers (2)

Related Questions