sahiljain
sahiljain

Reputation: 2374

Optimised way of getting column values from HBASE?

I have a situation where i just know the columnfamily and columnname in hbase and i want to retrieve all the unique values for that particular column and populate on my webapplication GUI with the time at utmost important.

One way is to try scan applying colfamily and columnname which takes time and make the end user wait for so long.

Is there any other way of doing it effectively and efficiently?

would be great if you could help. Thanks

Upvotes: 0

Views: 829

Answers (1)

Donald Miner
Donald Miner

Reputation: 39903

There is no magic way that is going to make scanning this data fast for a user interface. It needs to rip through all the data in the column family from disk to get the information that you want. Pretty much the only things you will get from hbase in any sort of interactive way is a specific rowkey get or a very small range scan.

Here are a couple of high-level approaches:

  • Do you care about latency/updates? recalculate the unique list every 20 minutes with a MapReduce job or a scan and store the results in a text file somewhere.
  • Use co-processors to determine the unique list per region, and then in the client aggregate the unique lists into one unique list. This will likely still be too slow, but it will speed up your scan if you have lots of duplicates and your network is being saturated.
  • Rethink how you are storing your data in hbase. Unlike RDBMS I can't just arbitrarily add indexes to columns. In schema design you have to think about how you are accessing your data and then base your schema design on that. Are you trying to get your unique list fast? Maybe you should build a second table with the original values as keys and then pointers back to the original rowkeys.
  • Can you keep track of the unique values in a separate system where you can fetch that information quickly?

Upvotes: 1

Related Questions