Reputation: 1672
How to build a proper structure for an analytics service? Currently i have 1 table that stores data about every user that visits the page with my client's ID so later my clients will be able to see the statistics for a specific date.
I've thought a bit today and I'm wondering: Let's say i have 1,000 users and everyone has around 1,000 impressions on their sites daily, means i get 1,000,000 (1M) new records every day to a single table. How will it work after 2 months or so (when the table reaches 60 Million records)?
I just think that after some time it will have so much records that the PHP queries to pull out the data will be really heavy, slow and take a lot of resources, is it true? and how to prevent that?
A friend of mine working on something similar and he is gonna make a new table for every client, is this the correct way to go with?
Thanks!
Upvotes: 6
Views: 1028
Reputation: 6499
Consider this Link to the Google Analytics Platform Components Overview page and pay special attention to the way the data is written to the database, simply based on the architecture of the entire system.
Instead of writing everything to your database right away, you could write everything to a log file, then process the log later (perhaps at a time when the traffic isn't so high). At the end of the day, you'll still need to make all of those writes to your database, but if you batch them together and do them when that kind of load is more tolerable, your system will scale a lot better.
Upvotes: 1
Reputation: 23552
Having a table with 60 million records can be ok. That is what a database is for. But you should be careful about how many fields you have in the table. Also what datatype (=>size) each field has.
You create some kind of reports on the data. Think about what data you really need for those reports. For example you might need only the numbers of visits per user on every page. A simple count would do the trick.
What you also can do is generate the report every night and delete the raw data afterwards.
So, read and think about it.
Upvotes: -1
Reputation: 14091
Problem you are facing is I/O bound system. 1 million records a day is roughly 12 write queries per second. That's achievable, but pulling the data out while writing at the same time will make your system to be bound at the HDD level.
What you need to do is configure your database to support the I/O volume you'll be doing, such as - use appropriate database engine (InnoDB and not MyISAM), make sure you have fast enough HDD subsystem (RAID, not regular drives since they can and will fail at some point), design your database optimally, inspect queries with EXPLAIN
to see where you might have gone wrong with them, maybe even use a different storage engine - personally, I'd use TokuDB if I were you.
And also, I sincerely hope you'd be doing your querying, sorting, filtering on the database side and not on PHP side.
Upvotes: 2
Reputation: 4702
You could normalize impressions the data like this;
Client Table
{
ID
Name
}
Pages Table
{
ID
Page_Name
}
PagesClientsVisits Table
{
ID
Client_ID
Page_ID
Visits
}
and just increment visits on the final table on each new impression. Then the maximum number of records in there becomes (No. of clients * No. of pages)
Upvotes: -1