Akhil
Akhil

Reputation: 646

Sync Data From Hbase To Hive

We are working on a project where we are using HBase as operational data store; all data is coming to hbase in real time. And, during every 2 hour, the data in Hbase needs to be synced to Hive. This is to enable analytical queries to run on top of latest data.

For syncing data from Hbase to Hive:

For insert/update only scenarios, I can use the timestamp column provided by hbase to know the inserted/updated records. For "DELETE" scenarios, I am struggling to find the right approach.

Does HBase Scan API provides any option to do that ?

Or should I go with any SQL options like Apache Phoenix for doing the same ?

Upvotes: 1

Views: 398

Answers (1)

mazaneicha
mazaneicha

Reputation: 9425

Here is the answer from HBase Reference Guide, section Keep Deleted Cells:

A new "raw" scan options returns all deleted rows and the delete markers...

. . .[example]

hbase(main):017:0> scan 'test', {RAW=>true, VERSIONS=>1000}

ROW COLUMN+CELL
r1 column=e:c1, timestamp=14, value=value
r1 column=e:c1, timestamp=12, value=value
r1 column=e:c1, timestamp=11, type=DeleteColumn
r1 column=e:c1, timestamp=10, value=value

1 row(s) in 0.0120 seconds

. . .

Note that there can be different types of markers -- DeleteColumn or DeleteFamily -- depending on what kind of DELETE has occurred.

Upvotes: 1

Related Questions