Reputation: 128
I've got a very simple algorithm that I'm playing with to determine what user-submitted content should display on the front page of a website I'm building. Right now the algorithm is triggered each time the page is loaded. The algorithm goes through all of the posts, calculates the newest weighted score, and displays the posts accordingly.
With no one on the site, this works fine, but seems unlikely to scale well. Is it typical in industry to optimize these cases? If I need to cache the calculations, how should that best be done? I'm very much self-taught so although I can get things to work, I don't always know if its ideal.
Part of the algorithm is time, which is important here. Aside from time, there are some other variables at play that I weight differently. I add all these "scores" together to form one "weighted score", the higher the score, the higher the post.
Upvotes: 3
Views: 526
Reputation: 2461
You could cache the results in the database, say in a field "Score", then upon a user accessing the page, run a SQL select to find any articles with a null score.
SQL: SELECT * FROM Articles WHERE Score IS NULL
Calculate these scores and store them to their associated articles, then utilize them through an ordered select statement to find which articles to display, possibly limiting how many articles to fetch and even performing pagination entirely through the cache.
Note: The scores should be absolute, based entirely on the article in question, not relative to the contents of other articles in the database.
SQL: SELECT * FROM Articles ORDER BY Score
Further improvements to efficiency could be done by limiting the cache generation to only events which actually change the articles. For example, you could call the cache generation event on submission of a new article, or upon the editing of an article.
Upvotes: 3
Reputation: 28316
The standard way to run anything periodically is cron. You can make it run any command periodically, including PHP scripts.
You could also cache the score of the post, or at least the part of the score related to its content, to increase efficiency. Full-text processing is expensive, so the score is certainly worth caching in the database from this point of view.
The trick is to figure out how to implement it in a way that allows you to score the post based on both content and age, whilst still allowing you to cache it. I would create a base score that is calculated from the content, then cache that. When you want to get the real score, you retrieve the cached base score and adjust it based on the age of the post.
Example:
// fetch cached post score, which doesn't take time into account
$base_score = get_post_base_score($post_id);
// now adjust the base score given how old the post is
$score = adjust_score($base_score, time() - $post_time);
Upvotes: -1
Reputation: 1954
There is no standard, really. Some systems run on an interval, like once a day, or once an hour. Others run each time the page is accessed. Caching can be used to reduce the load in the latter case.
It all depends on how efficiently the algorithm scales, how many posts it will have to deal with and how often the information is needed. If the operation is quick and cheap, you may as well just run it every time the page is accessed for your initial version. If it is fast enough in your testing and doesn't kill the server's memory usage then doing any more work is a waste of time. If it isn't good enough, think about caching the result, investing in better hardware or look for opportunities to improve the code.
If the results don't need to change very often, just schedule it once an hour/inute or so and make sure it meets your needs before shipping.
It is generally better to test the simplest solution before worrying about optimisation.
Upvotes: 3
Reputation: 33455
You're currently following the "don't store if you can calculate" tactic taught as a first step in database design classes.
However, if the "score" is not likely to change frequently, then it may be better to process all entries on a schedule, storing their score in a database, and then just pull the highest-scored items when the page loads.
Upvotes: 2