Reputation: 1771
I'm wondering what the best practices are in terms of storing counts / sums etc are? Is calculating on the fly on each page load a big no no?
I have a Rails App that contains products and sales reports from multiple sources. Some of these sales reports can stretch to 100k+ lines per day. At the moment, we're calculating on the fly for each page load. This is clearly extremely slow.
As the sales reports are almost certainly written once but read many times, would it be better to add a sum column to my product table and increment that each time new sales are written? I can see how this would be clearly a lot faster, but it does seriously limit my ability to offer date filtering; 'Top 10 in last 7 days' etc.
Would another solution that would enable filtering be to create a third table with daily sums for each product? The main sales table can contain multiple, sometimes thousands of lines for each product each day so again I can see how a condensed version of the main sales table could speed things up as well.
Should I be doing these calculations with before/after_save callbacks / during my import or should should I look at scheduled tasks to run in the background?
Can anyone offer some suggestions or recommended reading to help set me on the right path and help me towards improving my embarrassingly slow page load times?
Thanks in advance!
Upvotes: 1
Views: 114
Reputation: 26454
Great questions.
Generally I think it is good to start with generating the counts and sums on the fly. This follows Codd's rules, etc. The problem is that this may get slow over time. How you deal with it will depend on quite a number of things. I will give you my solutions for dealing with sales data and why I see this as ideal.
Sales data is significant because while it can be adjusted sometimes for brief periods of time after a while it will never be adjusted any more absent foul play (this is after books are closed). Sales books are often closed either monthly or yearly.
what you can do is put the aggregates in a closing entry with the date, or max id or the like, and put in triggers so that you can ensure that no entries get added to a closed period. this allows you to roll forward your aggregates from a known good value. In this case you are likely to aggregate only over periods between closings or over open periods.
Upvotes: 1