Julias
Julias

Reputation: 5892

how to efficiently write filter with timestamp in row for scanner

I have a hbase table where all keys have the following structure ID,DATE,OTHER_DETAILS For example:

10,2012-05-01,"some details"
10,2012-05-02,"some details"
10,2012-05-03,"some details"
10,2012-05-04,"some details"

...

How can I write a scan that get all the rows that older than some date? For example 2012-05-01 and 2012-05-02 are older than 2012-05-03.

 Scan scan = new Scan();
 Filter f = ???   
 scan.setFilter(f);
 scan.setCaching(1000);
 ResultScanner rs = table.getScanner(scan);

Upvotes: 1

Views: 1421

Answers (2)

Alexander Kuznetsov
Alexander Kuznetsov

Reputation: 3112

You can create your own Filter and implement the method filterRowKey. To make scan more faster you can also implement the method getNextKeyHint, but this is a bit complicated. The disadvantage of this approach is that you need to put jar file with your filter into the HBase classpath and restart cluster.

This approximate implementation of this filter.

@Override
public void reset() {
    this.filterOutRow = false;
}

@Override
public Filter.ReturnCode filterKeyValue(KeyValue v) {
    if(this.filterOutRow) {
        return ReturnCode.SEEK_NEXT_USING_HINT;
    }
    return Filter.ReturnCode.INCLUDE;
}

@Override
public boolean filterRowKey(byte[] data, int offset, int length) {
    if(startDate < getDate(data) && endDate > getDate(data)) {
        this.filterOutRow = true;
    }
    return this.filterOutRow;
}

@Override
public KeyValue getNextKeyHint(KeyValue currentKV) {
    if(getDate(currentKV) < startDate){   
         String nextKey = getId(currentKV)+","+startDate.getTime();
         return KeyValue.createFirstOnRow(Bytes.toBytes(nextKey));
    }
    if(getDate(currentKV) > endDate){   
         String nextKey = (getId(currentKV)+1)+","+startDate.getTime();
         return KeyValue.createFirstOnRow(Bytes.toBytes(nextKey));
    }
    return null;  
}

@Override
public boolean filterRow() {
    return this.filterOutRow;
}

Upvotes: 2

Tariq
Tariq

Reputation: 34184

store the key of the very first row somewhere. it will always be there in your final resultset, being the 'first' row, which makes it older than all other rows(am i correct??)

now take the date, which you want to use to filter out the results and create a RowFilter with RegexStringComparator using this date. this will give the row matching the specified criteria. now, using this row and the first row, which you had store earlier, do a range query.

and if you have multiple rows having the same date, say:

10,2012-05-04,"some details"
10,2012-05-04,"some new details"

take the last row, which you would have got after the RowFilter, and use the same technique.

HTH

i was trying to say that you can use range query to achieve this. where the "startrowkey" will be the first row of your table. being the first row it'll always be the oldest row which means you will always have this row in your result. and the "stoprowkey" for your range query will be the row which contains the given date. to find the stoprowkey you can set a "RowFilter" with "RegexStringComparator".

byte[] startRowKey = FIRST_ROW_OF_THE_TABLE;
Scan scan = new Scan();
Filter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL,new RegexStringComparator("YOUR_REGEX"));
scan.setFilter(filter);
ResultScanner scanner1 = table.getScanner(scan);
for (Result res : scanner1) {
    byte[] stopRowKey = res.getRow();
}
scanner1.close();

scan.setStartRow(startRowKey);
scan.setStopRow(stopRowKey);
ResultScanner scanner2 = table.getScanner(scan);
for (Result res : scanner2) {
    //you final result
}

Upvotes: 0

Related Questions