Marsellus Wallace
Marsellus Wallace

Reputation: 18601

Is it possible to Scan rows that changed after a certain date in HBase?

I need to Scan my HBase table and return only those rows that changed in a given time period (EG: last hour). Note that I need the entire row and not only the cell/value that changed.

The table has billions of rows and a couple of column families. It is updated regularly (sometimes we update the entire row, sometimes only a given cell). Looking at the documentation it seems that the TimestampFilter returns only 'cells' and using setTimeRange in Get or Scan returns only 'columns'. I need the entire row, is it possible through the api? If not, what's an efficient workaround?

Upvotes: 2

Views: 1770

Answers (2)

Ramzy
Ramzy

Reputation: 7138

The fact that you do updates for entire and only single column makes life difficult. I had a similar problem with get the count based on timestamp. Since the timestamp is at cell level, and we only insert entire row, I used a map reduce to group based on time stamp(format back to date), and then count. You can use a similar one, except that you should choose your individual columns, and when they are modified.

Upvotes: 0

Anil Gupta
Anil Gupta

Reputation: 1126

With TimestampFilter you can get the cell that was written in given time period. If you want entire row, then you will need to do get on that particular rowkey. I dont think this is an efficient way.
I would recommend you to build a TimeSeries table. Can you use timestamp as a Suffix in your rowkey? Have a look at section 6.3.1: http://hbase.apache.org/0.94/book/rowkey.design.html

If you need to use timestamp as a prefix, then you will need to do salting.
Have a look at this for salting: https://phoenix.apache.org/salted.html

Upvotes: 2

Related Questions