Is there a clever HBase Schema to Aid with Discovering Missing Value?

Question

Let's assume I have billions of rows in my HBase table. The rows in this table change slowly, meaning there will be new rowkeys and some rowkeys get deleted.

I receive lots of events per row. However, there will be very few rows that will not have any events associated with them.

At the end of the day I would like to report on the rows that have not received any events.

My naive solution would be to introduce a cf:c that holds a flag, set the flag to 1 every-time I see an event for it. Then do a full-scan of the table looking for rowkeys that are missing the column-value. That seems like a waste, because I would be looking through 10 billion rows to discover a handful of rowkeys (we are talking about 100s or low 1000s).

Is there a clever way to design the hbase schema such that the rowkeys that are missing events could be found quickly (without going through every row)?

Is there a clever HBase Schema to Aid with Discovering Missing Value?

Answers (1)

Related Questions