Reputation: 9986
There is a HBase
table with tens of billions records where a key is a line of 40 bytes. And also there is a list of hundreds of thousands keys. I need to get all records with this keys and return value of certain table field. So, my purpose is transform a set of keys to a set of values. What is the most convenient and/or efficient way to perform the task (with any programming language and technology)?
Upvotes: 1
Views: 303
Reputation: 9569
You can use HBase Java API. In java-like pseudo code
conf = HBaseConfiguration.create()
conf.set("hbase.zookeeper.quorum", "ZOOKEEPER_USED_BY_HBASE")
connection = ConnectionFactory.createConnection(conf)
table = connection.getTable("tablename")
gets = new ArrayList<Get>()
for all keys {
gets.add(new Get(key.toBytes()))
}
table.get(gets)
A few more suggestions:
Upvotes: 1
Reputation: 45
I was testing MapReduce on MongoDB to see how efficient it is at grabbing key/value pairs from a collection. It was only a collection of 100k records, but a small JavaScript function was able to retrieve all the countries and the amount of times they appeared in the collection.
Map1 = function()
{
Emit(this.country, 1)
}
Reduce1 = function(key, vals) {
for(var i=0, sum=0; i < vals.length; i++)
{
sum += vals[i];
}
return sum;
}
Then again, i don't know how effective M/R would be with billions of records.
Upvotes: 0