Loom
Loom

Reputation: 9986

Obtain value from HBase table by key

There is a HBase table with tens of billions records where a key is a line of 40 bytes. And also there is a list of hundreds of thousands keys. I need to get all records with this keys and return value of certain table field. So, my purpose is transform a set of keys to a set of values. What is the most convenient and/or efficient way to perform the task (with any programming language and technology)?

Upvotes: 1

Views: 303

Answers (2)

kostya
kostya

Reputation: 9569

You can use HBase Java API. In java-like pseudo code

conf = HBaseConfiguration.create()
conf.set("hbase.zookeeper.quorum", "ZOOKEEPER_USED_BY_HBASE")
connection = ConnectionFactory.createConnection(conf)
table = connection.getTable("tablename")
gets = new ArrayList<Get>()
for all keys {
    gets.add(new Get(key.toBytes()))
}
table.get(gets) 

A few more suggestions:

  • Have a look at Get javadocs, you can configure it to return only columns you are interested in
  • If keys share some common prefix using Scan with start/stop row might work as well. Call scan.setCaching(5000) to make it slightly faster if you use it.

Upvotes: 1

SPeoples
SPeoples

Reputation: 45

I was testing MapReduce on MongoDB to see how efficient it is at grabbing key/value pairs from a collection. It was only a collection of 100k records, but a small JavaScript function was able to retrieve all the countries and the amount of times they appeared in the collection.

Map1 = function()
{
    Emit(this.country, 1)
}

    Reduce1 = function(key, vals) {
for(var i=0, sum=0; i < vals.length; i++)
{
    sum += vals[i];
}
return sum;
}

Then again, i don't know how effective M/R would be with billions of records.

Upvotes: 0

Related Questions