Reputation: 349
I have a table with 2 million records.
Here is the table
comments
---------
+-------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+----------------+
| commentid | int(11) | NO | PRI | NULL | auto_increment |
| parentid | int(11) | YES | | 0 | |
| refno | int(11) | YES | | 0 | |
| createdate | int(11) | YES | MUL | 0 | |
| remoteip | varchar(80) | YES | | | |
| fingerprint | varchar(50) | YES | | | |
| locid | int(11) | YES | MUL | 0 | |
| clubid | int(11) | YES | | 0 | |
| profileid | int(11) | YES | MUL | 0 | |
| userid | int(11) | YES | MUL | 0 | |
| global | int(11) | YES | | 0 | |
| official | int(11) | YES | | 0 | |
| legacyuser | int(11) | YES | MUL | 0 | |
| mediaid | int(11) | YES | | 0 | |
| status | int(11) | YES | | 1 | |
| comment | varchar(4000) | YES | | | |
| likes | int(11) | YES | | 0 | |
| dislikes | int(11) | YES | | 0 | |
| import | int(11) | YES | | 0 | |
| author | varchar(50) | YES | | | |
+-------------+---------------+------+-----+---------+----------------+
Now this query against 2 million records takes 6 to 7 seconds:
select * from comments where (locid=2085 or global=1) and status>0 order by createdate desc limit 20;
I decided to add an index to locid and it still produces results in 6 to 7 seconds
I could have used a sqlfiddle but it would have been unneccessary since the basis of this question pertains to performance and I'm not adding 2mil records to sqlfiddle.
Are there any strategies or implementations which can get this query into the 3 seconds range?
Thanks!
UPDATE
This is my explain show table.
| comments | CREATE TABLE `comments` (
`commentid` int(11) NOT NULL AUTO_INCREMENT,
`parentid` int(11) DEFAULT '0',
`refno` int(11) DEFAULT '0',
`createdate` int(11) DEFAULT '0',
`remoteip` varchar(80) DEFAULT '',
`fingerprint` varchar(50) DEFAULT '',
`locid` int(11) DEFAULT '0',
`clubid` int(11) DEFAULT '0',
`profileid` int(11) DEFAULT '0',
`userid` int(11) DEFAULT '0',
`global` int(11) DEFAULT '0',
`official` int(11) DEFAULT '0',
`legacyuser` int(11) DEFAULT '0',
`mediaid` int(11) DEFAULT '0',
`status` int(11) DEFAULT '1',
`comment` varchar(4000) DEFAULT '',
`likes` int(11) DEFAULT '0',
`dislikes` int(11) DEFAULT '0',
`import` int(11) DEFAULT '0',
`author` varchar(50) DEFAULT '',
PRIMARY KEY (`commentid`),
KEY `comments_locid` (`locid`),
KEY `comments_userid` (`userid`),
KEY `idx_legacyusers` (`legacyuser`),
KEY `profile_index` (`profileid`),
KEY `comments_createdate` (`createdate`),
KEY `compound_for_comments` (`locid`,`global`,`status`),
KEY `global` (`global`),
KEY `status` (`status`)
) ENGINE=InnoDB AUTO_INCREMENT=3848451 DEFAULT CHARSET=latin1
Upvotes: 1
Views: 57
Reputation: 142208
This is likely to be even faster than the queries mentioned in two answer so far:
SELECT c.*
FROM (
( SELECT commentid, createdate
FROM comments
WHERE locid=2085
AND status > 0
ORDER BY createdate DESC
LIMIT 20
)
UNION DISTINCT
( SELECT commentid, createdate
FROM comments
WHERE global=1
AND status > 0
ORDER BY createdate DESC
LIMIT 20
)
ORDER BY createdate DESC
LIMIT 20
) x
JOIN comments c USING (commentid);
With both of these "covering" indexes:
INDEX(locid, status, createdate, commentid)
INDEX(global, status, createdate, commentid)
(based on later info) Since (global=1) is usually true and (status>0) is usually false, the following may be better. (There is a question of whether DESC
adds a monkey wrench.)
INDEX(locid, createdate, status, commentid)
INDEX(global, createdate, status, commentid)
There is still a risk with global
. If it is 'usually' 1
, then the above indexes may not be optimal.
This formulation will be faster because the subqueries will be entirely in the index ("covering"), rather than lugging around all the columns (*
). That does require an extra SELECT
, but it is an efficient JOIN
on the PRIMARY KEY
of only 20 rows. If your table ever became too big to cache, this would be a big performance bonus.
I was explicit with UNION DISTINCT
on the assumption that you would get dups otherwise. If not, then UNION ALL
would be faster.
Schema critique:
INTs
-- INT
is 4 bytes; TINYINT
is only 1; etc.UNSIGNED
where appropriate (especially, for ids and counts).NOT NULL
where appropriate.global
? status
?) by themselves; the index is very unlikely to be used.status
hold? If status>0
can be replaced by status=1
, my suggested index will work even better.Making the data smaller may speed up this query (and others).
Upvotes: 0
Reputation: 31772
Try this one:
select distinct * from (
select * from (
select * from comments where locid=2085 and status>0 order by commentid desc limit 20
) t1
union all
select * from (
select * from comments where global=1 and status>0 order by commentid desc limit 20
) t2
) t
order by commentid desc
limit 20
with indexes on (locid, status) and (global, status). (status, global) might be better than (global, status) - it depends on which column is more selective.
That only works if createdate
is sorted equal to commentid
. Othewise you would need indexes like (locid, status, createdate) and order by createdate
.
Upvotes: 0
Reputation: 424973
Most DBs, especially MySQL, are notoriously bad with or
.
You can eradicate the or
by splitting the query into a union
with each half handling one side of the or
, like this:
select * from (
select * from comments
where locid = 2085
and status > 0
union
select * from comments
where global = 1
and status > 0) x
order by createdate desc
limit 20
Upvotes: 2
Reputation: 1019
I believe 'order by' is causing this to take so much time. Remove order by and see if it is changed . You can order by the Primary Key because the later records have bigger assigned Primary Key which is a key and faster. Other option is to use an engine that is stored on memory rather than hard disk.
Upvotes: 0