Kaan Baris
Kaan Baris

Reputation: 3401

Database table structure for storing statistics data

I am trying to create a table in my MYSQL database for storing click data to my posts on daily basis, what I come up is something like this:

ID | post_id | click_type | created_date
1    1         page_click   2015-12-11 18:13:13
2    2         page_click   2015-12-13 11:16:34
3    3         page_click   2015-12-13 13:24:01
4    1         page_click   2015-12-15 15:31:10

For this type of storing I can get how many clicks does the post number 1 get in December 2015 and even I can get how many clicks does the post number something gets in 15 December between 01-11pm. However let's say I am getting 2000 clicks per day which means it will create 2000 rows per day which means 60.000 per month and 720.000 per year.

Another approach that comes to my mind is like this which stores a row for one day per post and if there is more than one click in that day it will increase the count

ID | post_id | click_type | created_date | count
1    1         page_click   2015-12-11     13
2    2         page_click   2015-12-11     26
3    3         page_click   2015-12-11     152
4    1         page_click   2015-12-12     14
5    2         page_click   2015-12-12     123
6    3         page_click   2015-12-12     163

In this approach if every page is clicked at least one time (which means creating the row) in every day it will generate 1000 rows each day (let's say I have 1000 posts) and 30.000 per month and 360.000 per year.

I am looking for an advice to how to store these statistics and if I want to get daily click statistics. I have some concerns about the performance (of course it's nothing for big data guys :D but sorry for my lack of experience). Do you think it will be ok if there is over 1 million rows in that table after 2-3 years? And which one is do you thing is going to be more effective for me?

Upvotes: 1

Views: 2140

Answers (2)

Joel Brown
Joel Brown

Reputation: 14408

720,000 records per year is not necessarily a lot of data. One option may be not to worry about it. Something to consider may be how long the click data matters. If after a year you don't really care anymore then you can have an historical data cleanup protocol that removes data that is older than you care about.

If you are worried about storing large amounts of data and you don't want to erase history, then you can consider pre-calculating your summarized statistics and storing them instead of your transaction detail.

The issue with this is that you have to know in advance what the smallest resolution of time will be that you will continue to care about. Also, if your motivation is saving space then you have to be careful that your summary data doesn't end up taking more space than the original transactions. This can easily happen if you store summarized data at multiple resolutions, as you might in a data warehouse arrangement.

Upvotes: 2

Ctx
Ctx

Reputation: 18410

This seems like a good application for rrdtool (http://oss.oetiker.ch/rrdtool/). Here you can specify several resolutions for different time intervals, e.g:

  • average 5 min for 1 day
  • average 30 min for 1 week
  • average 2 hours for 1 month
  • average 1 day for 1 Year

etc. This is also often used for graphs. Usually this is used with rrd-files, but it can also be based on mysql with rrdgraph_libdbi

Upvotes: 1

Related Questions