Reputation: 375
I have a HBase table with about 50 million rows and each row has several columns. My goal is to retrieve from the table those rows who have a given value in a given column, e.g. rows whose column 'col_1' has value 'val_1'.
I have two options to choose:
Does anyone give me some suggestions about which option runs faster, or you have another better option?
Thanks a lot!
Upvotes: 3
Views: 14631
Reputation: 11
Secondary index will be faster. You can also try a secondary index library like culvert, instead of creating your own index.
Upvotes: 1
Reputation: 25939
An index will surely work faster than scanning 50M rows every time. If you use an hbase version that already has coprocessors you can follow Xodarap advice. If you are using older versions of Hbase you need to setup an additional table to act as the index and update manually (either everytime you update the main table or occasionally via map/reduce)
Upvotes: 2