HBase: Custom scanner to only retrieve columns by prefix

Question

I am using a scanner to retrieve rows from HBase. I can set which columns I want back via the addColumn() method. However, I really need to be able to retrieve a variable number of columns that all start with the same prefix.

So, all the columns I want start with "USA", for example. I need to retrieve all columns that start with that, such as "USA-Virginia", "USA-Hawaii", etc. I do not want values such as "Canada-Quebec". There are no predefined values for the full column names anywhere. I just need all of them that start with "USA". Is there a way to get HBase Scanners to do this? I don't see much in the way of writing custom scanners out there.

I was looking at custom filters, but this just seems to limit the rows I get, as opposed to specifying the columns I want returned. Thoughts?

I cannot change the structure of my data, and all of my data is under a single column family.

Thanks for any ideas. I am running CDH3u4.

sulabhc · Accepted Answer

What you need is the ColumnPrefixFilter to filter keys by their columns prefix
http://archive.cloudera.com/cdh/3/hbase/apidocs/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.html

Something like this should do the trick :-

filter = new ColumnPrefixFilter(Bytes.toBytes("USA"))

HBase: Custom scanner to only retrieve columns by prefix

Answers (2)

Related Questions