Reputation: 2240
I'm maintaining a system where users create something called "books" that are accessed by other users.
I need a convenient (good performance) way to store events in database where users visit these books to later display graphs with statistics. The graphs need to demonstrate a history where the owner of the book can see which days in the week, and at which times there is more visiting activity (all over the months).
Using ERD (Entity-Relationship-Diagram), I can produce the following Conceptual Model:
At first the problem seems to be solved, as we have a very simple situation here. This will give me a table with 3 fields. One will be the occurrence of the visit event, and the other 2 will be foreign keys. One represents the user, while the other represents which book was visited. In short, every record in this table will be a visit:
However, thinking that a user can average about 10 to 30 book visits per day, and having a system with 100.000 users, in a single day this table can add many gigabytes of new records. I'm not the most experienced person in good database performance practices, but I'm pretty sure that this is not the solution.
Even though I do a cleanup on the database to delete old records, I need to keep a record history of the last 2 months of visits (at least).
I've been looking for a way to solve this for days, and I have not found anything yet. Could someone help me, please?
Thank you.
OBS: I'm using PostgreSQL 9.X, and the system is written in Java.
Upvotes: 0
Views: 286
Reputation: 230461
As mentioned in the comments, you might be overestimating data size. Let's do the math. 100k users at 30 books/day at, say, 30 bytes per record.
(100_000 * 30 * 30) / 1_000_000 # => 90 megabytes per day
Even if you add index size and some amount of overhead, this is still a few orders of magnitude lower than "many gigabytes per day".
Upvotes: 1