Reputation: 3345
What would be the easiest way to get a count of hbase table rows based on a time period using the inserted timestamp? I only have found using:
hbase> count ‘t1’, INTERVAL => 100000
This does not solve my problem. There seems to be another option but I am getting 0 results?
hbase> get 'hbase_output', '*', {TIMERANGE => [1445212800,1445299200]}
COLUMN CELL
0 row(s) in 0.0900 seconds
would this be the only two options to do this? I put the '*', for all rows in the table and thinking this may be incorrect.
Upvotes: 0
Views: 3627
Reputation: 1409
Since HBase 2.0 you can specify filters for count command.
E.g:
hbase> count 't1', FILTER => "(QualifierFilter (>=, 'binary:xyz')) AND (TimestampsFilter ( 123, 456))"
https://issues.apache.org/jira/browse/HBASE-18001
https://github.com/apache/hbase/blob/master/hbase-shell/src/main/ruby/shell/commands/count.rb
Upvotes: 0
Reputation: 7138
HBase maintains the time stamp and also versions for each record.
get is used to retrieve a specific record based on row key. So once you fulfill that critteria, you get additional options to get for different versions and time stamps.
scan is used to get all the records. Again you have the option to specify version and time stamp. However, since scan gives you the entire record list, you cant have a count operation.
So I am afraid, your best bet would be, to write a map reduce to scan, with time stamp range, and get the count. Infact, using map reduce Rowcounter is the best way to get Hbase count when compared to count shell method.
I have worked on a similar thing. Started with Rowcounter source code, and tweaked to add filter. For date, you can maintain your own field or can have any column qualifier recent date(as long as you have entire record being stored into Hbase). Otherwise, if you have parts of your row being saved separately, you have to use your specific column qualifier.
Upvotes: 1