Reputation: 73
I have a table that contains some statistic data which is collected per hour. Now I want to be able to quickly get statitics per day / week / month / year / total. What is the best way to do so performance wise? Creating views? Functions? Stored procedures? Or normal tables where i have to write to simultaneously when updating data? (I would like to avoid the latter). My current idea would be to create a view_day which sums up the hours, then a view_week and view_month and view_year which sum up data from view_day, and view_total which sums up view_year. Is it good or bad?
Upvotes: 4
Views: 2035
Reputation: 2476
We have a similar problem, and what we do is utilize a master/slave relationship. We do transactional data (both reads and writes, since in our case some reads need to be ultra fast and can't wait for replication for the transaction), on the master. The slave is quickly replicating data, but then we run every non-transactional query off that, including reporting.
I highly suggest this method as it's simple to put into place as a quick and dirty data warehouse if your data is granular enough to be useful in the reporting layers/apps.
Upvotes: 0
Reputation: 96572
My view is that complex calculations should only happen once as the data changes not every time you query. Create an aggregate data and populate it either through a trigger (if no log is acceptable) or through a job that runs once a day or once an hour or whatever lag time is acceptable for reporting. If you go the trigger route, test, test, test. Make sure it can handle multiple row inserts/updates/deletes as well as the more common single ones. Make sure it is as fast as possible and has no bugs whatsoever. Triggers will add a bit of processing to every data action, you have to make sure it adds the smallest possible bit and that no bugs will ever happen that will pervent users from inserting/updating/deleting data.
Upvotes: 0
Reputation: 10984
You essentially have two systems here: One that collects data and one that reports on that data.
Running reports against your frequently-updated, transactional tables will likely result in read-locks that block writes from completing as quickly as they can and therefore possibly degrade performance.
It is generally HIGHLY advisable to run periodic "gathering" task that gathers information from your (probably highly normalized) transactional tables and stuff that data into denormalized reporting tables forming a "data wharehouse". You then point your reporting engine / tools at the denormalized "data wharehouse" which can be queried against without impacting the live transactional database.
This gathering task should only run as often as your reports need to be "accurate". If you can get away with once a day, great. If you need to do this once an hour or more, then go ahead, but monitor the performance impact on your writing tasks when you do.
Remember, if the performance of your transactional system is important (and it generally is), avoid running reports against it at all costs.
Upvotes: 3
Reputation: 10645
The only real fast and scalable solution is as you put it "normal tables where you have to write to simultaneously when updating data" with proper indexes. You can automate updating of such table using triggers.
Upvotes: 1
Reputation: 254926
Yes, having the tables that store already aggregated data is a good practice.
Whereas views, as well as SPs and functions will just perform queries over big tables, which is not that efficient.
Upvotes: 1