Reputation: 952
Is there a way to retrieve the row keys in a given range without actually retrieving the columns/CFs associated with that row key?
For clarification: In my example, our table's row keys are stock ticker names (e.g. GOOG), and in our web app we'd like to populate an autocomplete widget using just the row keys we have in the database. Obviously, if we retrieve all the data (instead of only the stock names) for all the stocks between G and H when a user types 'G', we'll be unnecessarily straining our system. Any ideas?
Upvotes: 9
Views: 9892
Reputation: 131
According to the official documentation, you can optimally retrieve only the row keys using a combination of two filters: the KeyOnlyFilter and the FirstKeyOnlyFilter. (I think the "FirstKeyOnlyFilter" will return the key only once, even with large, complex rows.) If you only want keys in a given range, you can add that range to the scanner.
Here is some example code:
FilterList filters = new FilterList(FilterList.Operator.MUST_PASS_ALL,
new FirstKeyOnlyFilter(),
new KeyOnlyFilter());
Scan s = new Scan(filters);
// in order to limit the scan to a range
s.setStartRow(startRowKey); // first key in range
s.setStopRow(stopRowKey); // key value after the last key in the range
Source: https://hbase.apache.org/book.html#perf.hbase.client.rowkeyonly
Upvotes: 12
Reputation: 933
take a look at the filters (http://hbase.apache.org/book/client.filter.html), especially KeyOnlyFilter. the description of the filter (by http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/package-summary.html) is
A filter that will only return the key component of each KV (the value will be rewritten as empty).
in order to restrict the keys on a specific range use the Scan(rowStart, rowEnd) constructor.
Upvotes: 9
Reputation: 1141
I would create a column family called 'empty:', and store empty values for all the rows. Now, you can just just request to load the column 'empty:'. This is not ideal, but it is better than loading columns families with lot of data.
Upvotes: 1
Reputation: 430
One approach would be to maintain another index table which would have keys for all the possible FSA states for all the stocks. So next time whenever a user types in 'G', all you would have to do is hit this table and retrieve may be a comma separated list of all the values related to G.
Upvotes: 0
Reputation: 2378
you can use addFamily(byte[] family) or addFamily(byte[] family,byte[] qualifier) to retrieve just the relevant data
Upvotes: 0