hba
hba

Reputation: 7800

Is there a clever HBase Schema to Aid with Discovering Missing Value?

Let's assume I have billions of rows in my HBase table. The rows in this table change slowly, meaning there will be new rowkeys and some rowkeys get deleted.

I receive lots of events per row. However, there will be very few rows that will not have any events associated with them.

At the end of the day I would like to report on the rows that have not received any events.

My naive solution would be to introduce a cf:c that holds a flag, set the flag to 1 every-time I see an event for it. Then do a full-scan of the table looking for rowkeys that are missing the column-value. That seems like a waste, because I would be looking through 10 billion rows to discover a handful of rowkeys (we are talking about 100s or low 1000s).

Is there a clever way to design the hbase schema such that the rowkeys that are missing events could be found quickly (without going through every row)?

Upvotes: 2

Views: 114

Answers (1)

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29247

If I understood correctly, you have a rowkey xxxxyyyyzzzz1 ... xxxxyyyyzzzzn. You have events for some rows and no events for other rows. c is your flag to know whether events are there or not and you have huge data.


Rule of thumb in HBase: RowFilters are always faster and more efficient than column value filters (for searching that flag, a full table scan is required).

Your approach to scan the entire table for missing events (column value filter) will lead to a full table scan and is not efficient.

Conclusion: You have to use a row key filter to scan such a big table.

So I'd suggest you write the flag in the row key. For example :

0 -- is for no events 1 -- is there are events

xxxxyyyyzzzz1_0 // row with no events

xxxxyyyyzzzz1_1 // row with events

Now you can use a fuzzy row filter to capture missing event rows and take a report.

Option 2 of your another question which was answered by me

Is there a clever HBase Schema to Aid with Discovering Missing Value?

From, my experience with hbase, there is no such thing.

Upvotes: 0

Related Questions