Reputation: 4501
I have a table in HBase to store user objects. Each object has 4 columns and I name each column as [object_creation_date]_[column_name]
to auto order columns by object creation date.
For example:
RowKey 20140101_a 20140101_b 20140101_c 20140101_d 20140102_a 20140102_b 20140102_c 20140102_d
1 1a 1b 1c 1d 2a 2b 2c 2d
Now I'm trying to filter these values by column. Is there any way to find the object (set of 4 columns) whose property "C"
equals "2c"
? It should return 20140102
I tried using ColumnRangeFilter
class but it only seems to work with prefixes and I´d need a RegExp to find all "C"
columns no matter the date they were created.
Is there another way of doing this or maybe I could use a different representation for the data?
Upvotes: 0
Views: 489
Reputation: 25909
Since the structure of your qualifiers is fixed to 9 bytes of date + seperator it would be relatively easy to create your own filter by modifying the code of the ColumnPrefixFilter
Each line where it compares bytes of the qualifier name just add 9 to the offset as in
int cmp = Bytes.compareTo(buffer,
qualifierOffset + 9
, qualifierLength, this.prefix, 0, qualifierLength);
Upvotes: 0
Reputation: 3067
You could always implement your own filter, but why don't you just invert the order of the columns to [column_name]_[object_creation_date]
? That way you can use the standard ColumnPrefixFilter which seems more appropiate.
Anyway, I think you should consider moving from a wide to a tall approach:
RowKey u a b c d
1_20140101 1 1a 1b 1c 1d
1_20140102 1 2a 2b 2c 2d
1_20140103 1 3a 3b 3c 3d
This will allow you to perform very fast scans for a known user or even a full table scan for a known column.
Or, if you want to avoid having multiple rows for the same user you can use versioning.
Tip: For even better query performance you could even build your own "secondary-index" table with this type of row keys: [column]_[4char_md5_of_value]_[user_id]
RowKey value u d
a_afaa_1 1a 1 20140101
a_a32a_1 2a 1 20140102
a_45ae_1 3a 1 20140103
b_l413_1 1b 1 20140101
b_533a_1 2b 1 20140102
b_8ce3_1 3b 1 20140103
c_b31c_1 1c 1 20140101
c_2ca1_1 2c 1 20140102
c_a99f_1 3c 1 20140103
This would make looking for any column value ultra fast: For the value you want to search, perform the md5, and get the first 4 chars of the hex string, with that do a scan providing the row prefix [column]_[hash]
, and add the ColumnValue filter (because there could be multiple values under the same hash).
You can also restrict this table to the column/s you'll be querying to avoid saving data you won't need.
Upvotes: 1