Reputation: 21
Coming from RDBMS background, I need little help/suggestion to design Hbase schema for below usecase.
It is a report generating application using hadoop. Now, we need to track all previous report generation history for a particular user based on his email id. So, data need to be persisted are, email id, report name, start date, end date, status. I am planning to keep the email id as row key and other entities as columns, emailId(row key) - (columns) appName:reportName, appName:startDate, appName:endDate, appName:status
But the problem is, the same user can run same report for different date ranges. So it will overwrite the appName:reportName and appName:status columns. Since I am new to NoSQL world, I am not sure how to tackle this problem. Can someone please suggest me an ideal way of designing schema for this requirement?
Any help would be greatly appreciated.
Thanks
Upvotes: 2
Views: 457
Reputation: 51329
Based on your expected query pattern, here is what I'd suggest:
RowKey | Column Family (appName) |
[email protected] HH:MM:SSS | reportName | status | startDate | endDate |
This design gives you a few advantages. First of all, you can quickly query (using a scan) all rows for a particular user over a particular date range. Secondly, you are avoiding write hotspots by preceding the timestamp in the rowkey with the user's ID.
You can write one row to this schema each time a user triggers the generation of a report, and you won't need to worry about overwriting the columns (unless a user generates two reports in the same 1/10th of a second).
Upvotes: 1