dead_man
dead_man

Reputation: 409

Mysql query optimization with left joins

I have the following query and this query takes 1.141 sec to execute. This query has bunch of joins. Is there a way to optimize the query ? Any kind of help is appreciated.

SELECT `cou`.`id` AS `country_id`,
                `a`.`id` AS `area_id`,
                `y`.`id` AS `year_id`,
                `su`.`id` AS `subject_id`,
                `co1`.`name` AS `course_name`,
                `ca1`.`id` AS `root_category_id`,
                `ca1`.`name` AS `root_category_name`,
                `ca4`.`id` AS `chapter_id`,
                `ca4`.`name` AS `chapter_name`,
                `ca4`.`no_of_assets` AS `no_of_assets`,
                `ca4`.`active_status` AS `status`,
                0 AS `READ_IT`,
                0 AS `WATCH_IT`,
                0 AS `PLAY_IT`,
                0 AS `PROVE_IT`,
                count(DISTINCT `pa`.`id`) AS `APROVE_IT`,
                if((count(`pa`.`id`) > 0),'True', 'False') AS `sections_with_content`,
                count(`pa`.`id`) AS `content_count`,
                `pa`.`status` AS `content_flag`
         FROM (((((((((((((((((`edu_db`.`category_relation_xref` `crx1`
                               JOIN `edu_db`.`category` `ca1` on((`crx1`.`parent_id` = `ca1`.`id`)))
                              LEFT JOIN `edu_db`.`course` `co1` on((`ca1`.`course_id` = `co1`.`id`)))
                             JOIN `edu_db`.`category_relation_xref` `crx2` on((`crx1`.`child_id` = `crx2`.`parent_id`)))
                            JOIN `edu_db`.`category` `ca2` on((`crx2`.`parent_id` = `ca2`.`id`)))
                           JOIN `edu_db`.`category` `ca3` on((`crx2`.`child_id` = `ca3`.`id`)))
                          JOIN `edu_db`.`category_relation_xref` `crx3` on((`crx2`.`child_id` = `crx3`.`parent_id`)))
                         JOIN `edu_db`.`category` `ca4` on((`crx3`.`child_id` = `ca4`.`id`)))
                        LEFT JOIN `edu_db`.`category_relation_xref` `crx4` on((`crx3`.`child_id` = `crx4`.`parent_id`)))
                       LEFT JOIN `edu_db`.`category` `ca5` on((`crx4`.`child_id` = `ca5`.`id`)))
                      JOIN `edu_db`.`course` `co2` on((`ca4`.`course_id` = `co2`.`id`)))
                     JOIN `edu_db`.`curriculum` `cu` on((`co2`.`curriculum_id` = `cu`.`id`)))
                    JOIN `edu_db`.`year` `y` on((`cu`.`year_id` = `y`.`id`)))
                   JOIN `edu_db`.`subject` `su` on((`su`.`id` = `cu`.`subject_id`)))
                  JOIN `edu_db`.`area` `a` on((`y`.`area_id` = `a`.`id`)))
                 JOIN `edu_db`.`country` `cou` on((`a`.`country_id` = `cou`.`id`)))
                LEFT JOIN `edu_db`.`qbnk_category_published_assessment_xref` `qcpa` on((`ca4`.`id` = `qcpa`.`category_id`)))
               LEFT JOIN `edu_db`.`qbnk_published_assessment` `pa` on((`qcpa`.`published_assessment_id` = `pa`.`id`)))
         WHERE ((`pa`.`status` <> 'non_active')
                AND (`qcpa`.`status` <> 'deleted'))
         GROUP BY `ca4`.`id`

This is the output of the explain command. In here there is a select type that using filesort which means that query does not use an index. Is there a way to optimize this query with indexes?

 +------+---------------+---------+--------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+-----------+--------------------------------+--------+------------+-------------------------------------------------------------+
| "id" | "select_type" | "table" | "partitions" |  "type"  |                                                                                      "possible_keys"                                                                                      |              "key"               | "key_len" |             "ref"              | "rows" | "filtered" |                           "Extra"                           |
+------+---------------+---------+--------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+-----------+--------------------------------+--------+------------+-------------------------------------------------------------+
| "1"  | "SIMPLE"      | "pa"    | \N           | "index"  | "PRIMARY,status"                                                                                                                                                                          | "status"                         | "2"       | \N                             | "7714" | "50.00"    | "Using where; Using index; Using temporary; Using filesort" |
| "1"  | "SIMPLE"      | "qcpa"  | \N           | "ref"    | "PRIMARY,FK_qbnk_cat_id_pub_ass_tbl_to_category_tbl_id,cat_pub_status"                                                                                                                    | "PRIMARY"                        | "4"       | "edu_db.pa.id"             | "6"    | "50.00"    | "Using where"                                               |
| "1"  | "SIMPLE"      | "ca4"   | \N           | "eq_ref" | "PRIMARY,FK_curriculum_item_id_category_tbl_to_id_curriculum_item_tbl,FK_curriculum_id_category_tbl_to_id_curriculum_tbl,name,Fk_category_tbl_course_id_to_course_tbl_id,Index 6,Index 7" | "PRIMARY"                        | "4"       | "edu_db.qcpa.category_id"  | "1"    | "100.00"   | "Using where"                                               |
| "1"  | "SIMPLE"      | "co2"   | \N           | "eq_ref" | "PRIMARY,FK_curriculum_id_to_id_curriculum_tbl"                                                                                                                                           | "PRIMARY"                        | "4"       | "edu_db.ca4.course_id"     | "1"    | "100.00"   | "Using where"                                               |
| "1"  | "SIMPLE"      | "cu"    | \N           | "eq_ref" | "PRIMARY,FK_subject_id_to_id_subject_tbl,FK_year_id_to_id_year_tble"                                                                                                                      | "PRIMARY"                        | "4"       | "edu_db.co2.curriculum_id" | "1"    | "100.00"   | "Using where"                                               |
| "1"  | "SIMPLE"      | "su"    | \N           | "eq_ref" | "PRIMARY"                                                                                                                                                                                 | "PRIMARY"                        | "4"       | "edu_db.cu.subject_id"     | "1"    | "100.00"   | "Using index"                                               |
| "1"  | "SIMPLE"      | "y"     | \N           | "eq_ref" | "PRIMARY,FK_year_tbl_area_id_to_id_area_tbl"                                                                                                                                              | "PRIMARY"                        | "4"       | "edu_db.cu.year_id"        | "1"    | "100.00"   | "Using where"                                               |
| "1"  | "SIMPLE"      | "a"     | \N           | "eq_ref" | "PRIMARY,FK_country_id_to_country_tbl"                                                                                                                                                    | "PRIMARY"                        | "4"       | "edu_db.y.area_id"         | "1"    | "100.00"   | \N                                                          |
| "1"  | "SIMPLE"      | "cou"   | \N           | "eq_ref" | "PRIMARY"                                                                                                                                                                                 | "PRIMARY"                        | "4"       | "edu_db.a.country_id"      | "1"    | "100.00"   | "Using index"                                               |
| "1"  | "SIMPLE"      | "crx3"  | \N           | "ref"    | "PRIMARY,FK_child_id_to_id_category_tbl"                                                                                                                                                  | "FK_child_id_to_id_category_tbl" | "4"       | "edu_db.qcpa.category_id"  | "1"    | "100.00"   | "Using index"                                               |
| "1"  | "SIMPLE"      | "ca3"   | \N           | "eq_ref" | "PRIMARY,Index 6,Index 7"                                                                                                                                                                 | "PRIMARY"                        | "4"       | "edu_db.crx3.parent_id"    | "1"    | "100.00"   | "Using index"                                               |
| "1"  | "SIMPLE"      | "crx2"  | \N           | "ref"    | "PRIMARY,FK_child_id_to_id_category_tbl"                                                                                                                                                  | "FK_child_id_to_id_category_tbl" | "4"       | "edu_db.crx3.parent_id"    | "1"    | "100.00"   | "Using index"                                               |
| "1"  | "SIMPLE"      | "ca2"   | \N           | "eq_ref" | "PRIMARY,Index 6,Index 7"                                                                                                                                                                 | "PRIMARY"                        | "4"       | "edu_db.crx2.parent_id"    | "1"    | "100.00"   | "Using index"                                               |
| "1"  | "SIMPLE"      | "crx1"  | \N           | "ref"    | "PRIMARY,FK_child_id_to_id_category_tbl"                                                                                                                                                  | "FK_child_id_to_id_category_tbl" | "4"       | "edu_db.crx2.parent_id"    | "1"    | "100.00"   | "Using index"                                               |
| "1"  | "SIMPLE"      | "ca1"   | \N           | "eq_ref" | "PRIMARY,Index 6,Index 7"                                                                                                                                                                 | "PRIMARY"                        | "4"       | "edu_db.crx1.parent_id"    | "1"    | "100.00"   | \N                                                          |
| "1"  | "SIMPLE"      | "co1"   | \N           | "eq_ref" | "PRIMARY"                                                                                                                                                                                 | "PRIMARY"                        | "4"       | "edu_db.ca1.course_id"     | "1"    | "100.00"   | \N                                                          |
| "1"  | "SIMPLE"      | "crx4"  | \N           | "ref"    | "PRIMARY"                                                                                                                                                                                 | "PRIMARY"                        | "4"       | "edu_db.qcpa.category_id"  | "4"    | "100.00"   | "Using index"                                               |
| "1"  | "SIMPLE"      | "ca5"   | \N           | "eq_ref" | "PRIMARY,Index 6,Index 7"                                                                                                                                                                 | "PRIMARY"                        | "4"       | "edu_db.crx4.child_id"     | "1"    | "100.00"   | "Using index"                                               |
+------+---------------+---------+--------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+-----------+--------------------------------+--------+------------+-------------------------------------------------------------+

Following are the create code for category_relation_xref and qbnk_category_published_assessment_xref tables

 CREATE TABLE `category_relation_xref` (
    `parent_id` INT(11) NOT NULL,
    `child_id` INT(11) NOT NULL,
    `template_id` INT(11) NOT NULL DEFAULT '1',
    `possition_id` INT(11) NOT NULL DEFAULT '1',
    `comment` VARCHAR(255) NULL DEFAULT NULL COLLATE 'utf16_unicode_ci',
    `display_order` INT(11) NOT NULL DEFAULT '0',
    `created_at` TIMESTAMP NULL DEFAULT NULL,
    `updated_at` TIMESTAMP NULL DEFAULT NULL,
    `created_by` INT(11) NULL DEFAULT NULL,
    `updated_by` INT(11) NULL DEFAULT NULL,
    PRIMARY KEY (`parent_id`, `child_id`),
    INDEX `FK_child_id_to_id_category_tbl` (`child_id`),
    INDEX `FK_cat_rel_xref_tbl_template_id_to_content_template_tbl` (`template_id`),
    INDEX `FK_cat_rel_xref_tbl_possition_id_to_content_template_tbl_id` (`possition_id`),
    CONSTRAINT `FK_cat_rel_xref_tbl_possition_id_to_content_template_tbl_id` FOREIGN KEY (`possition_id`) REFERENCES `content_possition` (`id`),
    CONSTRAINT `FK_cat_rel_xref_tbl_template_id_to_content_template_tbl` FOREIGN KEY (`template_id`) REFERENCES `content_template` (`id`),
    CONSTRAINT `category_relation_xref_ibfk_1` FOREIGN KEY (`parent_id`) REFERENCES `category` (`id`),
    CONSTRAINT `category_relation_xref_ibfk_2` FOREIGN KEY (`child_id`) REFERENCES `category` (`id`)
)
COMMENT='store parent child relations'
COLLATE='utf16_unicode_ci'
ENGINE=InnoDB
;

CREATE TABLE `qbnk_category_published_assessment_xref` (
    `published_assessment_id` INT(11) NOT NULL,
    `category_id` INT(11) NOT NULL,
    `status` ENUM('active','deleted') NOT NULL DEFAULT 'active' COLLATE 'utf16_unicode_ci',
    `comment` VARCHAR(255) NULL DEFAULT NULL COLLATE 'utf16_unicode_ci',
    `created_at` TIMESTAMP NULL DEFAULT NULL,
    `updated_at` TIMESTAMP NULL DEFAULT NULL,
    `created_by` INT(11) NULL DEFAULT '0',
    `updated_by` INT(11) NULL DEFAULT '0',
    PRIMARY KEY (`published_assessment_id`, `category_id`),
    INDEX `FK_qbnk_cat_id_pub_ass_tbl_to_category_tbl_id` (`category_id`),
    INDEX `cat_pub_status` (`status`),
    CONSTRAINT `FK_qbnk_cat_id_pub_ass_tbl_to_category_tbl_id` FOREIGN KEY (`category_id`) REFERENCES `category` (`id`) ON DELETE CASCADE,
    CONSTRAINT `FK_qbnk_cat_pub_ass_id_to_pub_ass_tbl_id` FOREIGN KEY (`published_assessment_id`) REFERENCES `qbnk_published_assessment` (`id`) ON DELETE CASCADE
)
COMMENT='Store category published assessments mappings'
COLLATE='utf16_unicode_ci'
ENGINE=InnoDB
;

Upvotes: 2

Views: 85

Answers (3)

Rick James
Rick James

Reputation: 142298

I call this "over normalization".

There is no need for an extra table for country -- there are perfectly good CHAR(2) CHARACTER SET ascii values for all the countries. Note that INT is 4 bytes, so this change saves space!

That is a datatype YEAR; use it. If you have a full date or datetime, then use DATE or DATETIME and pick the YEAR out of it in the SELECT. YEAR is smaller than INT.

Read about ENUM as a possible datatype for columns that have a small number of possible values.

etc.

Upvotes: 0

DRapp
DRapp

Reputation: 48139

After cleaning up the query, stripping unnecessary tic marks and parenthesis, I believe the following is much easier to follow and directly see the relationships between the tables. You had it well done with the joins of leftAlias.leftColumn = rightAlias.rightColumn.

Now that I can see things better, lets consider your tables. They MOSTLY appear to be lookup tables where you have an ID and a descriptive column you are returning. I would CREATE covering indexes on these tables so the join resolutions can be handled directly from the indexes instead of going to the raw data pages.

One ADDITIONAL consideration on time is to add the MySQL keyword "STRAIGHT_JOIN" which tells the engine to query the tables in the order I have them listed. Dont think for me. Each of your tables starts at the topmost level and gets all the lookups down-stream. You can also compare the time by removing the STRAIGHT_JOIN clause too.. On a system I worked on YEARS ago, it took a primary table with about 15M records and 20+ lookup tables from being hung after 12+ hours processing and reduced to a completed query running in just over an hour.

Original query cleaned up

SELECT STRAIGHT_JOIN
      cou.id AS country_id,
      a.id AS area_id,
      y.id AS year_id,
      su.id AS subject_id,
      co1.`name` AS course_name,
      ca1.id AS root_category_id,
      ca1.`name` AS root_category_name,
      ca4.id AS chapter_id,
      ca4.`name` AS chapter_name,
      ca4.no_of_assets AS no_of_assets,
      ca4.active_status AS `status`,
      0 AS READ_IT,
      0 AS WATCH_IT,
      0 AS PLAY_IT,
      0 AS PROVE_IT,
      count(DISTINCT pa.id) AS APROVE_IT,
      if((count(pa.id) > 0),'True', 'False') AS sections_with_content,
      count(pa.id) AS content_count,
      pa.`status` AS content_flag
   FROM 
      edu_db.category_relation_xref crx1
         JOIN edu_db.category ca1
            ON crx1.parent_id = ca1.id
            LEFT JOIN edu_db.course co1
               ON ca1.course_id = co1.id

         JOIN edu_db.category_relation_xref crx2 
            ON crx1.child_id = crx2.parent_id

            JOIN edu_db.category ca2
               ON crx2.parent_id = ca2.id

            JOIN edu_db.category ca3
               ON crx2.child_id = ca3.id

            JOIN edu_db.category_relation_xref crx3 
               ON crx2.child_id = crx3.parent_id
               JOIN edu_db.category ca4 
                  ON crx3.child_id = ca4.id

                  JOIN edu_db.course co2
                     ON ca4.course_id = co2.id
                     JOIN edu_db.curriculum cu
                        ON co2.curriculum_id = cu.id
                        JOIN edu_db.`year` y 
                           ON cu.year_id = y.id
                           JOIN edu_db.area a
                              ON y.area_id = a.id
                              JOIN edu_db.country cou 
                                 ON a.country_id = cou.id
                        JOIN edu_db.subject su
                           ON cu.subject_id = su.id

                  LEFT JOIN edu_db.qbnk_category_published_assessment_xref qcpa
                     ON ca4.id = qcpa.category_id
                     LEFT JOIN edu_db.qbnk_published_assessment pa
                        ON qcpa.published_assessment_id = pa.id

               LEFT JOIN edu_db.category_relation_xref crx4
                  ON crx3.child_id = crx4.parent_id
                  LEFT JOIN edu_db.category ca5 
                     ON crx4.child_id = ca5.id
   WHERE 
          pa.`status` <> 'non_active'
      AND qcpa.`status` <> 'deleted'
   GROUP BY 
      ca4.id

Another item... You have some tables that are LEFT joined and not even being used in the query and can be completely removed. Explicitly the "LEFT JOIN edu_db.category ca5". You are not pulling any values from the CA5 alias, and a left-join means you didn't care about it anyhow. Similarly for the "LEFT JOIN edu_db.category_relation_xref crx4"

SELECT STRAIGHT_JOIN
      a.country_id,
      a.id AS area_id,
      y.id AS year_id,
      cu.subject_id,
      co1.`name` AS course_name,
      ca1.id AS root_category_id,
      ca1.`name` AS root_category_name,
      ca4.id AS chapter_id,
      ca4.`name` AS chapter_name,
      ca4.no_of_assets AS no_of_assets,
      ca4.active_status AS `status`,
      0 AS READ_IT,
      0 AS WATCH_IT,
      0 AS PLAY_IT,
      0 AS PROVE_IT,
      count(DISTINCT pa.id) AS APROVE_IT,
      if((count(pa.id) > 0),'True', 'False') AS sections_with_content,
      count(pa.id) AS content_count,
      pa.`status` AS content_flag
   FROM 
      edu_db.category_relation_xref crx1
         JOIN edu_db.category ca1
            ON crx1.parent_id = ca1.id
            LEFT JOIN edu_db.course co1
               ON ca1.course_id = co1.id

         JOIN edu_db.category_relation_xref crx2 
            ON crx1.child_id = crx2.parent_id
            JOIN edu_db.category_relation_xref crx3 
               ON crx2.child_id = crx3.parent_id
               JOIN edu_db.category ca4 
                  ON crx3.child_id = ca4.id
                  JOIN edu_db.course co2
                     ON ca4.course_id = co2.id
                     JOIN edu_db.curriculum cu
                        ON co2.curriculum_id = cu.id
                        JOIN edu_db.`year` y 
                           ON cu.year_id = y.id
                           JOIN edu_db.area a
                              ON y.area_id = a.id

                  JOIN edu_db.qbnk_category_published_assessment_xref qcpa
                     ON ca4.id = qcpa.category_id
                     AND qcpa.`status` <> 'deleted'

                     JOIN edu_db.qbnk_published_assessment pa
                        ON qcpa.published_assessment_id = pa.id
                        AND pa.`status` <> 'non_active'
   GROUP BY 
      ca4.id

Your WHERE clauses associated with the "pa" and "qcpa" cancel-out the LEFT JOIN portion as the where turns it into a WHERE clause. So I removed the "LEFT" component AND moved the where clause portion directly to that join component.

You are pulling the "subject" table (alias su), but only grabbing the su.id. Since you have the subject ID from the "cu" alias, you can just use the "cu.subject_id" instead and remove yet another table from the query -- UNLESS you plan on getting other descriptions from the subject table. This is probably the same for your country, area, year joins as well. If you already have an ID from a prior table, use that and drop what is not needed...

Not using the "ca2" or "ca3" aliases for any underlying extra details, descriptions, get rid of it.

So my index suggestions for each table would include the following. These will be more COVERING indexes. These should not be individual indexes on the same table, such as Tbl1 index on ID, Tbl1 index on Description, but Tbl1 index ON (id, description) as a single index.

table                                   index
qbnk_published_assessment               ( id, `status` ) 
qbnk_category_published_assessment_xref ( category_id, `status`, published_assessment_id )
area                                    ( id )
`year`                                  ( id, area_id ) 
curriculum                              ( id, year_id )
course                                  ( id, curriculum_id )
category                                ( id, course_id )
category_relation_xref                  ( parent_id, child_id )

Upvotes: 2

Willem Renzema
Willem Renzema

Reputation: 5187

Since the EXPLAIN starts with the pa and ocpa tables, we can see that the optimizer is trying to filter the results first, before doing anything else. That is smart, but another smart(er?) route would to be to perform the GROUP BY first, and then filter the results.

New Index

With that in mind, let's try to help the optimizer do that to see if we get any improvement:

ALTER TABLE qbnk_category_published_assessment_xref
ADD INDEX so52589130_qcpa (`status`,`category_id`,`published_assessment_id`)

Explanation

The above index allows starting at the category table for the GROUP BY, then use the WHERE clause on the status column with the category_id to match the two leftmost columns of that index. That gives immediate access to the published_assessment_id column without having to make a secondary lookup.

The reason it isn't trying to do this in the first place is that the indexes you have on the qbnk_category_published_assessment_xref don't allow that without a secondary lookup.

cat_pub_status allows using the status, but then it gets the published_assessment_id before the category_id, which doesn't allow it to use the category_id to quickly find just the record it wants.

FK_qbnk_cat_id_pub_ass_tbl_to_category_tbl_id allows using the category_id first, but then still has to go back to the clustered index to get the status.

Our new index allows just one trip to this table, which should help save time.

Possible additional savings

The fact that the two WHERE clauses are <> may reduce the performance a little, in comparison to if they were = values. If there are only two status values for each status column, I would swap those to use = instead. That is not overly important though, if there's more than just 2 values.

Instructions

So, give the above index a try. Run it twice, discard the first timing result as the buffer cache will be loading on that call. Reply with the second time, AND whether or not the EXPLAIN plan changed. If the EXPLAIN plan changed, and the timing is still not fast enough, add the new EXPLAIN plan to your question and I'll take another look. (I may need more table definitions at that time too, but we'll wait and see how this works out first.)

Upvotes: 0

Related Questions