Reputation: 625
I'm kind of new to Hbase and my following problems might seem silly! I apologize in advance :)
We have a use case where we need to store some large data in Hbase, each row is almost 30MB, and we store it in 6 columns of 5MB each and 2 columns for some metadata all in one column family. We have two types of data and we use Hbase as some big queue!
We have created two tables in Hbase, named TableA and TableB. We must insert data (from type A or B) and we have a pull function that should get one row (of type A or B), return it and delete it from table.
We have three clusters nodes with 4GB RAM and sufficient storage.
For that job, in Java, I make a ResultScanner
, get the key of the first row, and retrieve whole row with Get
, something like below
Scan scanA = new Scan();
scanA.addColumn(familyByteArray, oneSmallColumnByteArray);
ResultScanner scanner = tblA.getScanner(scanA);
// The big problem is here, this blows region servers and takes too long
// to respond
Result r = scanner.next();
// no problem here
Get get = new Get(r.getRowKey());
r = tblA.get(get);
The first time of scanner.next()
call blows regionServers (even stored data is small almost (8k rows)) and by increasing hbase.rpc.timeout
I prevent from SocketTimeoutException
, but regionServers still going down in first next()
sometimes.
First, scanner.next()
takes for example 60 seconds but next scanner.next()
s answer quickly (like 1 second).
As I mentioned before I don't care about which row returns, I just want to get one row and return it.
Do you have any idea on how to increase the speed of scanner.next()
and prevent it from killing regionServers?
Upvotes: 0
Views: 1265
Reputation: 633
First of all, what you mean by 3 cluster? I think what you meant to say is 3 node cluster.
Now as for the solution 4Gb (is it total memory of Node?) ram is not at all sufficient for Hbase unless it's a local VM.
Ideal heap allocated for HBase should not be less than 8Gb. Now i would suggest some modification to the code
Upvotes: 1