Reputation: 3690
I currently already have a website running using CodeIgniter and MySQL. The MySQL database is around 110 tables big and contains mainly website specific data, like user data, vacancy data, etc.
Now I want to extend this website to include a complete statistical module as well. We would capture a lot of user actions and other aggregations from the data gather on our own website, and would also pull in some data from google analytics API to use in our statistics (we will generate a report in Excel but also show statistical graphs and numbers on a page (using chart.js)). We are not thinking (in a forseeable future) to use this data in other programs, but we need to be able to open some data to the public using an API. We expect to start with about 300.000-350.000 data points gathered per day, but this amount will keep on growing every day of course, the more users we get.
Using multiple databases in CodeIgniter seems to not be an issue, so the main problem I am left with is how I should create the architecture for this statistical module.
I have a couple of idea's on how to start doing this, but I am not aware if there is performance impact from one to another solution or other things to take into consideration. My main idea boils down to having a table containing all "events", which just insert in that table every time an action is performed, eg "user is registered", "user put account on private", "user clicked on X", ... Then once a day (probably at around midnight), a CRON job would run over that table for the past day and aggregate all the values into a format usable for our statistical metrics. Those aggregated values would be stored in a new table. This way we can clean up the "event" table quite regularly since that will become very big very fast.
Idea 1: Extend the current MySQL database architecture with new tables to incorporate the statistics. I would keep on using the current database architecture and add 2 new tables for the events and the aggregated values.
Idea 2: Create a new database, separate from the current existing one, and use this to insert all the events in a table there and the aggregated values in a new table there.
Note: we already have quite a few CRONS running on our current database, updating statusses and dates, sending emails, ...
Note2: sync issues between databases is not an issue since we will never be storing statistics on a per-user level.
Upvotes: 1
Views: 153
Reputation: 142366
MySQL does not care whether tables are in the same database or separate databases. It is just a convenience for the user. Some things:
db1.tbla JOIN db2.tblb
to talk across dbs.GRANTs
for different databases, but clumsy to have different GRANTs
for 110 tables.Nightly aggregation is a middle-of-the road approach. Using IODKU gives you 'immediate' aggregation, but is probably more burden on the system.
350K rows inserted per day is about 5/second, which is comfortably low, so I don't think we need to discuss performance issues there.
"Summarize and toss" (for events) -- Yes. I like that approach. (Most people fail to think of this option.)
Do the math. Which table is the largest after a year? How many GB will it be? Then think about whether you can shrink any of the columns in it: SMALLINT
instead of INT
, normalization of long, oft-repeated, strings, etc.
Upvotes: 1