Reputation: 2493
I want to retrieve data from hbase for my mapreduce job, but I want to filter it before. I only want to retrieve the data, which contains a column with a id which is bigger or equal than a minId.
Im storing the Id in HBase as a string. Now I wonder if using this filter does work then.
int minId = 123;
Filter filter = new ValueFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL, new BinaryComparator(Bytes.toBytes(minId)));
How can HBase filter my data, when the ID which is stored is a String, but the value used to compare the data is an int? Can this work? If I use a String for my BinaryComparator (so String mindId = "123";
would this work then?
Thanks for answers!
Upvotes: 0
Views: 1384
Reputation: 35435
HBase string filter uses lexical comparison. So, this would work only if the no. of digits in all ids is the same. One thing you can do is to zero pad the IDs.
So "123" > "121", but "123" < "21". If you zero pad it, it becomes "123" and "021" and then you will get the right result.
Another idea can be to create a comparator to match your requirements. Just override the BinaryComparators compareTo() method. May be something like this (I am just editing the compareTo method in PureJavaComparator):
@Override
public int compareTo(byte[] buffer1, int offset1, int length1,
byte[] buffer2, int offset2, int length2) {
// Remove leading zeros
int l1 = getNumLeadingZeros(buffer1, offset1, length1);
int l2 = getNumLeadingZeros(buffer2, offset2, length2);
offset1=offset1+l1;
length1=length1-l1;
offset2=offset2+l2;
length2=length2-l2;
// If lengths are different, just return the longer int
int ldiff = length1-length2;
if(ldiff != 0) return ldiff;
// If lengths are same, we can use the usual lexical comparator
return Bytes.compareTo(buffer1, offset1, length1, buffer2, offset2, length2);
}
public int getNumLeadingZeros(byte[] arr, int offset, int length) {
int ret = 0;
byte zero = '0';
int i=0;
while(i<length && arr[offset+i]==zero) {
++ret;
}
return ret;
}
It's not super-optimized, and it assumes there are no bad values. You can slip the leading zeros thing also if you are sure there won;t be anything like that. I have not tested it, so try it out and let me know if it worked!
Upvotes: 1