Jon Walker
Jon Walker

Reputation: 59

Optimization Needed For Dual Left Join Query

I've always struggled with mysql joins but have started incorporating more but struggling to understand despite reading dozens of tutorials and mysql manual.

My situation is I have 3 tables:

/* BASICALLY A TABLE THAT HOLDS FAN RECORDS */

    CREATE TABLE `fans` (
      `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
      `first_name` varchar(255) DEFAULT NULL,
      `middle_name` varchar(255) DEFAULT NULL,
      `last_name` varchar(255) DEFAULT NULL,
      `email` varchar(255) DEFAULT NULL,
      `join_date` datetime DEFAULT NULL,
      `twitter` varchar(255) DEFAULT NULL,
      `twitterCrawled` datetime DEFAULT NULL,
      `twitterImage` varchar(255) DEFAULT NULL,
      PRIMARY KEY (`id`),
      UNIQUE KEY `email` (`email`)
    ) ENGINE=MyISAM AUTO_INCREMENT=20413 DEFAULT CHARSET=latin1;

    /* A TABLE OF OUR TWITTER FOLLOWERS */

    CREATE TABLE `twitterFollowers` (
      `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
      `screenName` varchar(25) DEFAULT NULL,
      `twitterId` varchar(25) DEFAULT NULL,
      `customerId` int(11) DEFAULT NULL,
      `uniqueStr` varchar(50) DEFAULT NULL,
      PRIMARY KEY (`id`),
      UNIQUE KEY `unique` (`uniqueStr`)
    ) ENGINE=InnoDB AUTO_INCREMENT=13426 DEFAULT CHARSET=utf8;

    /* TABLE THAT SUGGESTS A LIKELY MATCH OF A TWITTER FOLLOWER BASED ON THE EMAIL / SCREEN NAME COMPARISON OF THE FAN vs OUR FOLLOWERS 
    IF SOMEONE (ie. a moderator) CONFIRMS OR DENIES THAT IT'S A GOOD MATCH THEY PUT A DATESTAMP IN `dismissed` */

    CREATE TABLE `contentSuggestion` (
      `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
      `userId` int(11) DEFAULT NULL,
      `fanId` int(11) DEFAULT NULL,
      `twitterAccountId` int(11) DEFAULT NULL,
      `contentType` varchar(50) DEFAULT NULL,
      `contentString` varchar(255) DEFAULT NULL,
      `added` datetime DEFAULT NULL,
      `dismissed` datetime DEFAULT NULL,
      `uniqueStr` varchar(255) DEFAULT NULL,
      PRIMARY KEY (`id`),
      UNIQUE KEY `unstr` (`uniqueStr`)
    ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;

What I'm trying to get is:

SELECT [fan columns] WHERE fan screen name IS IN twitterfollowers AND WHERE fan screen name IS NOT IN contentSuggestion (with a datestamp in dismissed)

My attempts so far:

~33 seconds

SELECT fans.id, tf.screenName as col1, tf.twitterId as col2 FROM fans LEFT JOIN twitterFollowers tf ON tf.screenName = fans.emailUsername LEFT JOIN contentSuggestion cs ON cs.contentString = tf.screenName WHERE dismissed IS NULL GROUP BY(fans.id) HAVING col1 != ''

~14 seconds

SELECT id, emailUsername FROM fans WHERE emailUsername IN(SELECT DISTINCT(screenName) FROM twitterFollowers) AND emailUsername NOT IN(SELECT DISTINCT(contentString) FROM contentSuggestion WHERE dismissed IS NULL) GROUP BY (fans.id);

9.53 seconds

SELECT fans.id, tf.screenName as col1, tf.twitterId as col2 FROM fans LEFT JOIN twitterFollowers tf ON tf.screenName = fans.emailUsername WHERE tf.uniqueStr NOT IN(SELECT uniqueStr FROM contentSuggestion WHERE dismissed IS NULL)

I hope there is a better way. I've been struggling to really use JOINS outside of a single LEFT JOIN which has already helped me speed up other queries by a significant amount.

Thanks for any help you can give me.

Upvotes: 1

Views: 49

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270873

I would go with a variation of the second method. Instead of IN, use EXISTS. Then add the correct indexes and remove the aggregation:

SELECT f.id, f.emailUsername
FROM fans f
WHERE EXISTS (SELECT 1
              FROM twitterFollowers tf
              WHERE f.emailUsername = tf.screenName
             ) AND
      NOT EXISTS (SELECT 1
                  FROM contentSuggestion cs
                  WHERE f.emailUsername = cs.contentString AND
                        cs.dismissed IS NULL
                 ) ;

Then be sure you have the following indexes: twitterFollowers(screenName) and contentSuggestion(contentString, dismissed).

Some notes:

  • When using IN, don't use SELECT DISTINCT. I'm not 100% sure that MySQL is always smart enough to ignore the DISTINCT in the subquery (it is redundant).
  • Historically, EXISTS was faster than IN in MySQL. The optimizer has improved in recent versions.
  • For performance, you need the correct indexes. Then be sure you have the following indexes: twitterFollowers(screenName) and contentSuggestion(contentString, dismissed).
  • Assuming that fan.id is unique (a very reasonable assumption), you don't need the final group by.

Upvotes: 0

Related Questions