Reputation: 4842
The following query works, but becomes slower as the size of the sendlog table increases over time. The goal is to select a list of all subscribers from the newsletter_subscribers table which do not have an email entry for a given newsletter id in the newsletter_sendlog table. Currently, it's taking about 2.2 seconds on my mysql server with only a couple of thousand entries in the sendlog.
SELECT `newsletter_subscribers`.*
FROM `newsletter_subscribers`
INNER JOIN `newsletter_to_subscriber`
ON newsletter_to_subscriber.subscriber_id = newsletter_subscribers.id
LEFT JOIN (
SELECT `newsletter_sendlog`.`subscriber_email`
FROM `newsletter_sendlog`
WHERE (newsletter_id='7')
) AS `sendlog`
ON newsletter_subscribers.email = sendlog.subscriber_email
WHERE (sendlog.subscriber_email IS NULL)
AND (newsletter_to_subscriber.newsletter_id = '7')
EXPLAIN(query) outputs the following:
I'm not too familiar with the output of EXPLAIN, but if I read it correctly it would suggest that it's not using the index which I've defined on newsletter_sendlog.subscriber_email. I've tried using USE INDEX(email) on that table, but it does not seem to take effect.
Any suggestions on how to optimize this? Or possibly suggest another query that does the same?
The create table for newsletter_sendlog:
CREATE TABLE `newsletter_sendlog` (
`id` int(11) unsigned NOT NULL auto_increment,
`subscriber_email` varchar(100) NOT NULL default '',
`newsletter_id` int(11) default NULL,
`sendstatus` int(11) default NULL,
`senddate` timestamp NOT NULL default CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `newsletter_id` (`newsletter_id`),
KEY `email` (`subscriber_email`)
) ENGINE=MyISAM AUTO_INCREMENT=2933 DEFAULT CHARSET=latin1;
create table for newsletter_subscribers:
CREATE TABLE `newsletter_subscribers` (
`id` int(11) unsigned NOT NULL auto_increment,
`email` varchar(100) NOT NULL default '',
`name` tinytext,
PRIMARY KEY (`id`),
KEY `email` (`email`)
) ENGINE=MyISAM AUTO_INCREMENT=2964 DEFAULT CHARSET=utf8;
create table for newsletter_to_subscriber:
CREATE TABLE `newsletter_to_subscriber` (
`id` int(11) unsigned NOT NULL auto_increment,
`newsletter_id` int(11) NOT NULL,
`subscriber_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `newsletter_subscriber` (`newsletter_id`,`subscriber_id`)
) ENGINE=MyISAM AUTO_INCREMENT=2964 DEFAULT CHARSET=latin1;
Create table for newsletter_to_subscriber now looks like this after adding index on subscriber_id:
CREATE TABLE `newsletter_to_subscriber` (
`id` int(11) unsigned NOT NULL auto_increment,
`newsletter_id` int(11) NOT NULL,
`subscriber_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `newsletter_subscriber` (`newsletter_id`,`subscriber_id`),
KEY `subscriber` (`subscriber_id`)
) ENGINE=MyISAM AUTO_INCREMENT=2964 DEFAULT CHARSET=latin1;
The explain for the query suggested by @nobody:
Upvotes: 0
Views: 458
Reputation: 10635
Firstly you don't need that sub-query:
SELECT `newsletter_subscribers`.*
FROM `newsletter_subscribers`
INNER JOIN `newsletter_to_subscriber`
ON( newsletter_to_subscriber.subscriber_id = newsletter_subscribers.id )
LEFT JOIN `newsletter_sendlog`
ON( newsletter_subscribers.email = newsletter_sendlog.subscriber_email AND
newsletter_sendlog.newsletter_id = '7' )
WHERE newsletter_sendlog.subscriber_email IS NULL
the query above will do the job.
Secondly in newsletter_to_subscriber
you have a multi-part index on newsletter_id
and subscriber_id
which can't be used by your query because it will be searching for subscriber_id
and that comes second in the index, you need to have a separate index on subscriber_id
:
INDEX( subscriber_id )
Upvotes: 0
Reputation: 29619
Not totally sure, but I think the index is being ignored because you're looking for NULL values.
There's a different, hopefully more efficient way to run this query though:
select *
from newsletter_subscribers
where email not in
(select subscriber_email
from newsletter_sendlog
where newsletter_id='7')
Upvotes: 0
Reputation:
SELECT `newsletter_subscribers`.* FROM `newsletter_subscribers`
INNER JOIN `newsletter_to_subscriber`
ON newsletter_to_subscriber.subscriber_id = newsletter_subscribers.id
LEFT JOIN (
SELECT `newsletter_sendlog`.`subscriber_email` FROM `newsletter_sendlog`
WHERE (newsletter_id='7')) AS `sendlog`
ON newsletter_subscribers.email=sendlog.subscriber_email
WHERE (sendlog.subscriber_email IS NULL)
AND (newsletter_to_subscriber.newsletter_id = '7')
Could you try implementing index key on single column newsletter_to_subscriber.subscriber_id
and see if it helps?
Try using table structure as below:
CREATE TABLE `newsletter_to_subscriber` (
`id` int(11) unsigned NOT NULL auto_increment,
`newsletter_id` int(11) NOT NULL,
`subscriber_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `newsletter_subscriber` (`newsletter_id`,`subscriber_id`)
KEY `subscriber_id_key` (`subscriber_id`)
KEY `newsletter_id_key` (`newsletter_id`)
) ENGINE=MyISAM AUTO_INCREMENT=2964 DEFAULT CHARSET=latin1;
Upvotes: 0
Reputation: 1281
It is always better to select the specific fields, in stead of an asterisk (*), and avoid backticks (`). Try to see if the following (rewritten) query works better:
SELECT
newsletter_subscribers.id,
newsletter_subscribers.email,
newsletter_subscribers.name
FROM
newsletter_subscribers
LEFT JOIN
newsletter_to_subscriber
ON
newsletter_to_subscriber.subscriber_id = newsletter_subscribers.id
LEFT JOIN
newsletter_sendlog
ON
newsletter_subscribers.email = newsletter_sendlog.subscriber_email
WHERE
newsletter_to_subscriber.newsletter_id = 7
AND
newsletter_sendlog.newsletter_id = 7
AND
newsletter_sendlog.subscriber_email IS NULL
Upvotes: 1