Raj
Raj

Reputation: 21

Hbase schema design suggestion

Coming from RDBMS background, I need little help/suggestion to design Hbase schema for below usecase.

It is a report generating application using hadoop. Now, we need to track all previous report generation history for a particular user based on his email id. So, data need to be persisted are, email id, report name, start date, end date, status. I am planning to keep the email id as row key and other entities as columns, emailId(row key) - (columns) appName:reportName, appName:startDate, appName:endDate, appName:status

But the problem is, the same user can run same report for different date ranges. So it will overwrite the appName:reportName and appName:status columns. Since I am new to NoSQL world, I am not sure how to tackle this problem. Can someone please suggest me an ideal way of designing schema for this requirement?

Any help would be greatly appreciated.

Thanks

Upvotes: 2

Views: 457

Answers (1)

Chris Shain
Chris Shain

Reputation: 51329

Based on your expected query pattern, here is what I'd suggest:

RowKey                                 | Column Family (appName)                   |
[email protected] HH:MM:SSS | reportName | status | startDate | endDate |

This design gives you a few advantages. First of all, you can quickly query (using a scan) all rows for a particular user over a particular date range. Secondly, you are avoiding write hotspots by preceding the timestamp in the rowkey with the user's ID.

You can write one row to this schema each time a user triggers the generation of a report, and you won't need to worry about overwriting the columns (unless a user generates two reports in the same 1/10th of a second).

Upvotes: 1

Related Questions