Reputation: 59
I've always struggled with mysql joins but have started incorporating more but struggling to understand despite reading dozens of tutorials and mysql manual.
My situation is I have 3 tables:
/* BASICALLY A TABLE THAT HOLDS FAN RECORDS */
CREATE TABLE `fans` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `first_name` varchar(255) DEFAULT NULL, `middle_name` varchar(255) DEFAULT NULL, `last_name` varchar(255) DEFAULT NULL, `email` varchar(255) DEFAULT NULL, `join_date` datetime DEFAULT NULL, `twitter` varchar(255) DEFAULT NULL, `twitterCrawled` datetime DEFAULT NULL, `twitterImage` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `email` (`email`) ) ENGINE=MyISAM AUTO_INCREMENT=20413 DEFAULT CHARSET=latin1; /* A TABLE OF OUR TWITTER FOLLOWERS */ CREATE TABLE `twitterFollowers` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `screenName` varchar(25) DEFAULT NULL, `twitterId` varchar(25) DEFAULT NULL, `customerId` int(11) DEFAULT NULL, `uniqueStr` varchar(50) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `unique` (`uniqueStr`) ) ENGINE=InnoDB AUTO_INCREMENT=13426 DEFAULT CHARSET=utf8; /* TABLE THAT SUGGESTS A LIKELY MATCH OF A TWITTER FOLLOWER BASED ON THE EMAIL / SCREEN NAME COMPARISON OF THE FAN vs OUR FOLLOWERS IF SOMEONE (ie. a moderator) CONFIRMS OR DENIES THAT IT'S A GOOD MATCH THEY PUT A DATESTAMP IN `dismissed` */ CREATE TABLE `contentSuggestion` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `userId` int(11) DEFAULT NULL, `fanId` int(11) DEFAULT NULL, `twitterAccountId` int(11) DEFAULT NULL, `contentType` varchar(50) DEFAULT NULL, `contentString` varchar(255) DEFAULT NULL, `added` datetime DEFAULT NULL, `dismissed` datetime DEFAULT NULL, `uniqueStr` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `unstr` (`uniqueStr`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;
What I'm trying to get is:
SELECT [fan columns] WHERE fan screen name IS IN twitterfollowers AND WHERE fan screen name IS NOT IN contentSuggestion (with a datestamp in dismissed)
My attempts so far:
~33 seconds
SELECT fans.id, tf.screenName as col1, tf.twitterId as col2 FROM fans LEFT JOIN twitterFollowers tf ON tf.screenName = fans.emailUsername LEFT JOIN contentSuggestion cs ON cs.contentString = tf.screenName WHERE dismissed IS NULL GROUP BY(fans.id) HAVING col1 != ''
~14 seconds
SELECT id, emailUsername FROM fans WHERE emailUsername IN(SELECT DISTINCT(screenName) FROM twitterFollowers) AND emailUsername NOT IN(SELECT DISTINCT(contentString) FROM contentSuggestion WHERE dismissed IS NULL) GROUP BY (fans.id);
9.53 seconds
SELECT fans.id, tf.screenName as col1, tf.twitterId as col2 FROM fans LEFT JOIN twitterFollowers tf ON tf.screenName = fans.emailUsername WHERE tf.uniqueStr NOT IN(SELECT uniqueStr FROM contentSuggestion WHERE dismissed IS NULL)
I hope there is a better way. I've been struggling to really use JOINS outside of a single LEFT JOIN which has already helped me speed up other queries by a significant amount.
Thanks for any help you can give me.
Upvotes: 1
Views: 49
Reputation: 1270873
I would go with a variation of the second method. Instead of IN
, use EXISTS
. Then add the correct indexes and remove the aggregation:
SELECT f.id, f.emailUsername
FROM fans f
WHERE EXISTS (SELECT 1
FROM twitterFollowers tf
WHERE f.emailUsername = tf.screenName
) AND
NOT EXISTS (SELECT 1
FROM contentSuggestion cs
WHERE f.emailUsername = cs.contentString AND
cs.dismissed IS NULL
) ;
Then be sure you have the following indexes: twitterFollowers(screenName)
and contentSuggestion(contentString, dismissed)
.
Some notes:
IN
, don't use SELECT DISTINCT
. I'm not 100% sure that MySQL is always smart enough to ignore the DISTINCT
in the subquery (it is redundant).EXISTS
was faster than IN
in MySQL. The optimizer has improved in recent versions.twitterFollowers(screenName)
and contentSuggestion(contentString, dismissed)
.fan.id
is unique (a very reasonable assumption), you don't need the final group by
.Upvotes: 0