Reputation: 18601
I need to Scan my HBase table and return only those rows that changed in a given time period (EG: last hour). Note that I need the entire row and not only the cell/value that changed.
The table has billions of rows and a couple of column families. It is updated regularly (sometimes we update the entire row, sometimes only a given cell). Looking at the documentation it seems that the TimestampFilter
returns only 'cells' and using setTimeRange
in Get
or Scan
returns only 'columns'. I need the entire row, is it possible through the api? If not, what's an efficient workaround?
Upvotes: 2
Views: 1770
Reputation: 7138
The fact that you do updates for entire and only single column makes life difficult. I had a similar problem with get the count based on timestamp. Since the timestamp is at cell level, and we only insert entire row, I used a map reduce to group based on time stamp(format back to date), and then count. You can use a similar one, except that you should choose your individual columns, and when they are modified.
Upvotes: 0
Reputation: 1126
With TimestampFilter you can get the cell that was written in given time period. If you want entire row, then you will need to do get
on that particular rowkey. I dont think this is an efficient way.
I would recommend you to build a TimeSeries table. Can you use timestamp as a Suffix in your rowkey?
Have a look at section 6.3.1: http://hbase.apache.org/0.94/book/rowkey.design.html
If you need to use timestamp as a prefix, then you will need to do salting
.
Have a look at this for salting: https://phoenix.apache.org/salted.html
Upvotes: 2