Mr Morgan
Mr Morgan

Reputation: 2243

Optimising a slow MySQL query

enter image description here

I have a MySQL query as follows:

SELECT KeywordText, SUM(Frequency) AS Frequency FROM Keyword, Keyword_Polling_Frequency_Index
WHERE Keyword.KeywordText 
IN ('deal', 'obama' and other keywords...) 
AND RSSFeedNo IN (106, 107 and other RSS feeds) 
AND PollingDateTime 
BETWEEN '2011-10-28 13:00:00' AND '2011-10-28 13:59:00' 
AND Keyword.KeywordNo = Keyword_Polling_Frequency_Index.KeywordNo 
GROUP BY Keyword.KeywordText 
ORDER BY Keyword.KeywordText ASC

The query is used by an hourly batch program which involves two tables and is meant to get the frequencies of a list of keywords from a list of RSS feeds for a given hour. The Keyword_Polling_Frequency_Index table has a composite primary key of KeywordNo, RSSFeedNo and PollingDateTime. The query joins this table to the Keyword table which contains the KeywordText. column keywordText has a MySQL MyISAM full text index.

In testing this was found to perform satisfactorily but has now started running very slowly and affects the interactive speed of pages of the application. When I check the MySQL logs, I find that MySQL is creating temporary tables.

So, my question is, given that this query has to handle dozens of keywords in dozens of RSS feeds to calculate the frequencies, can anyone suggest an optimisation?

I have thought of breaking the query up by keyword but am not convinced of the practicality of this.

Can anyone help?

I am using MySQL Community Edition 5.X and an EXTENDED EXPLAIN of a version of this query is shown above.

SQL for the tables is as follows:

CREATE TABLE `keyword` (
`KeywordNo` int(10) unsigned NOT NULL AUTO_INCREMENT,
`KeywordText` varchar(64) NOT NULL,
`UserOriginated` enum('TRUE','FALSE') NOT NULL,
`Active` enum('TRUE','FALSE') NOT NULL,
`UserNo` varchar(50) NOT NULL,
`StopWord` enum('TRUE','FALSE') NOT NULL,
`CreatedDate` date NOT NULL,
`CreatedTime` time NOT NULL,
PRIMARY KEY (`KeywordNo`),
FULLTEXT KEY `KEYWORDTEXT` (`KeywordText`)
) ENGINE=MyISAM AUTO_INCREMENT=44047 DEFAULT CHARSET=latin1$$


CREATE TABLE `keyword_polling_frequency_index` (
`KeywordNo` int(10) unsigned NOT NULL,
`RSSFeedNo` int(10) unsigned NOT NULL,
`PollingDateTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`Frequency` int(10) NOT NULL,
`Active` enum('TRUE','FALSE') NOT NULL,
`UserNo` varchar(50) NOT NULL,
PRIMARY KEY (`KeywordNo`,`RSSFeedNo`,`PollingDateTime`),
KEY `FK_keyword_polling_frequency_index_1` (`UserNo`),
CONSTRAINT `FK_keyword_polling_frequency_index_1` FOREIGN KEY (`UserNo`) REFERENCES `user`    (`UserNo`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1$$

Upvotes: 1

Views: 251

Answers (2)

planestepper
planestepper

Reputation: 3307

As mentioned previously, add an index to the PollingDateTime field in the order mentioned as well. This is my suggestion:

SELECT 
    K.KeywordText, 
    SUM(F.Frequency) AS Frequency 
FROM 
    Keyword K, Keyword_Polling_Frequency_Index F
WHERE 
    EXISTS
        (
        SELECT 1
        FROM Keyword K1
        WHERE
            MATCH K1.KeywordText AGAINST ('deal obama "another keyword" yetanother' IN BOOLEAN MODE)
            AND K1.KeywordNo = K.KeywordNo
        )
    AND K.KeywordNo = F.KeywordNo
    AND F.PollingDateTime BETWEEN '2011-10-28 13:00:00' AND '2011-10-28 13:59:00'
    AND F.RSSFeedNo IN (106, 107, 110)
    GROUP BY K.KeywordText 
    ORDER BY K.KeywordText ASC

This will probably reduce the number of records for the comparison (SQL inside-out parsing) instead of directly matching two tables (N x N).

Upvotes: 1

Albin Sunnanbo
Albin Sunnanbo

Reputation: 47048

If you don't have any indexes you should create relevant indexes.

The minimum index is on keyword_polling_frequency_index.PollingDateTime

Upvotes: 0

Related Questions