Reputation: 4941
I have one test page running #-tag trending which is stored in my database. The page takes a long time to load (up to 2 minutes) before the content is displayed, only one function used a that page
URL: http://www.sudanesetweeps.com/trendingtopics.php
How can I adjust the statement so the page load in less time?
This is my code:
<?php
require_once("dbconnect.php");
require_once("lib_isarabic.php");
$query = "SELECT COUNT( * ) cnt, hashtags
FROM tweets
WHERE tweeted_at >= DATE_SUB( NOW( ) , INTERVAL 2 DAY )
AND hashtags != ''
GROUP BY hashtags
ORDER BY cnt DESC LIMIT 100";
$res = mysql_query($query);
while($row = mysql_fetch_assoc($res) ) {
$count = $row['cnt'];
$hashtags = explode( " ", $row['hashtags'] );
foreach($hashtags as $hashtag ) {
if( strtolower($hashtag) != 'sudan' && strtolower($hashtag) != 'new' && strtolower($hashtag) != 'new' )
if( is_arabic($hashtag) )
$topics_ara[strtolower( trim($hashtag) )] += $count;
else
$topics_eng[strtolower( trim($hashtag) )] += $count;
}
}
array_multisort($topics_ara, SORT_DESC);
array_multisort($topics_eng, SORT_DESC);
$index = 0;
foreach($topics_eng as $key=>$value) {
$query = "SELECT count(*) cnt FROM (
SELECT count(*), tweeted_by FROM tweets
WHERE hashtags like '%$key%'
AND tweeted_at >= DATE_SUB( NOW( ) , INTERVAL 2 DAY )
GROUP BY tweeted_by
) AS T";
/* $query = "
SELECT count(*) FROM tweets
WHERE hashtags like '%$key%'
AND tweeted_at > DATE_SUB( NOW( ) , INTERVAL 1 DAY ) ";
*/
$res = mysql_query($query);
$row = mysql_fetch_assoc($res);
if($row['cnt'] > 1) {
$index++;
if($key != "" ) {
$trending_eng[$key] = $value;
}
}
if($index > 30)
break;
}
$index = 0;
foreach($topics_ara as $key=>$value) {
$query = "SELECT count(*) cnt FROM (
SELECT count(*), tweeted_by FROM tweets
WHERE hashtags like '%$key%'
AND tweeted_at >= DATE_SUB( NOW( ) , INTERVAL 2 DAY )
GROUP BY tweeted_by
) AS T";
$res = mysql_query($query);
$row = mysql_fetch_assoc($res);
if($row['cnt'] > 1) {
$index++;
if($key != "" ) {
$trending_ara[$key] = $value;
}
}
if($index > 30)
break;
}
//var_dump($trending_eng) ;
//var_dump($trending_ara) ;
?>
Upvotes: 0
Views: 316
Reputation: 65304
Sorry, but your data model is defective.
You do not normalize the tweets, but do a fulltext search for the hashtags (via hashtags like '%$key%'
), which means that the complete text of all tweets in the time interval has to run through a CPU-intensive process - not only one time, but through two foreach()
loops of 30 iterations each.
So you do 60 full text scans - good luck with that.
The correct way would be to norlize the tweets on receiving them, splitting the hashtags and creating a table similar to hashtag | user | count
Upvotes: 3