Reputation: 5103
I have this query (shown below) which currently uses temporary and filesort in order to generate a grouped by set of ordered results. I would like to get rid of their usage if possible. I have looked into the underlying indexes used in this query and I just can't see what is missing.
SELECT
b.institutionid AS b__institutionid,
b.name AS b__name,
COUNT(DISTINCT f2.facebook_id) AS f2__0
FROM education_institutions b
LEFT JOIN facebook_education_matches f ON b.institutionid = f.institutionid
LEFT JOIN facebook_education f2 ON f.school_uid = f2.school_uid
WHERE
(
b.approved = '1'
AND f2.facebook_id IN ( [lots of facebook ids here ])
)
GROUP BY b__institutionid
ORDER BY f2__0 DESC
LIMIT 10
Here is the output for EXPLAIN EXTENDED
:
+----+-------------+-------+--------+--------------------------------+----------------+---------+----------------------------------+------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+--------------------------------+----------------+---------+----------------------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | f | index | PRIMARY,institutionId | institutionId | 4 | NULL | 308 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | f2 | ref | facebook_id_idx,school_uid_idx | school_uid_idx | 9 | f.school_uid | 1 | 100.00 | Using where |
| 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4 | f.institutionId | 1 | 100.00 | Using where |
+----+-------------+-------+--------+--------------------------------+----------------+---------+----------------------------------+------+----------+----------------------------------------------+
The CREATE TABLE
statements for each table are shown below so you know the schema.
CREATE TABLE facebook_education (
education_id int(11) NOT NULL AUTO_INCREMENT,
name varchar(255) DEFAULT NULL,
school_uid bigint(20) DEFAULT NULL,
school_type varchar(255) DEFAULT NULL,
year smallint(6) DEFAULT NULL,
facebook_id bigint(20) DEFAULT NULL,
degree varchar(255) DEFAULT NULL,
PRIMARY KEY (education_id),
KEY facebook_id_idx (facebook_id),
KEY school_uid_idx (school_uid),
CONSTRAINT facebook_education_facebook_id_facebook_user_facebook_id FOREIGN KEY (facebook_id) REFERENCES facebook_user (facebook_id)
) ENGINE=InnoDB AUTO_INCREMENT=484 DEFAULT CHARSET=utf8;
CREATE TABLE facebook_education_matches (
school_uid bigint(20) NOT NULL,
institutionId int(11) NOT NULL,
created_at timestamp NULL DEFAULT NULL,
updated_at timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (school_uid),
KEY institutionId (institutionId),
CONSTRAINT fk_facebook_education FOREIGN KEY (school_uid) REFERENCES facebook_education (school_uid) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT fk_education_institutions FOREIGN KEY (institutionId) REFERENCES education_institutions (institutionId) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT;
CREATE TABLE education_institutions (
institutionId int(11) NOT NULL AUTO_INCREMENT,
name varchar(100) NOT NULL,
type enum('School','Degree') DEFAULT NULL,
approved tinyint(1) NOT NULL DEFAULT '0',
deleted tinyint(1) NOT NULL DEFAULT '0',
normalisedName varchar(100) NOT NULL,
created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (institutionId)
) ENGINE=InnoDB AUTO_INCREMENT=101327 DEFAULT CHARSET=utf8;
Any guidance would be greatly appreciated.
Upvotes: 4
Views: 2285
Reputation: 432180
The filesort probably happens because you have no suitable index for the ORDER BY
It's mentioned in the MySQL "ORDER BY Optimization" docs.
What you can do is load a temp table, select from that afterwards. When you load the temp table, use ORDER BY NULL
. When you select from the temp table, use ORDER BY .. LIMIT
The issue is that group by adds an implicit order by <group by clause> ASC
unless you disable that behavior by adding a order by null
.
It's one of those MySQL specific gotcha's.
Upvotes: 4
Reputation: 1963
I can see two possible optimizations,
b.approved = '1' - You definitely need an index on approved column for quick filtering.
f2.facebook_id IN ( [lots of facebook ids here ]) ) - Store the facebook ids in a temp table,. Then create an index on the temp table and then join with the temp table instead of using IN clause.
Upvotes: 0