Jung
Jung

Reputation: 63

Mysql query not optimized and very slow, but why?

in the software that i develop, a car delear software, there's a section with the agenda with all the appointments of the users.

This section is pretty fast to load with a daily and normal use of the agenda, thousands of rows, but start to be really slow when the agenda tables reach 1 million of rows.

The structure:

1) Main table

CREATE TABLE IF NOT EXISTS `agenda` (
  `id_agenda` int(11) NOT NULL AUTO_INCREMENT,
  `id_user` int(11) NOT NULL DEFAULT '0',
  `id_agency` int(11) NOT NULL DEFAULT '0',
  `id_customer` int(11) DEFAULT NULL,
  `id_car` int(11) DEFAULT NULL,
  `id_owner` int(11) DEFAULT NULL,
  `type` int(11) NOT NULL DEFAULT '8',
  `title` varchar(255) NOT NULL DEFAULT '',
  `text` text NOT NULL,
  `start_day` date NOT NULL DEFAULT '0000-00-00',
  `end_day` date NOT NULL DEFAULT '0000-00-00',
  `start_hour` time NOT NULL DEFAULT '00:00:00',
  `end_hour` time NOT NULL DEFAULT '00:00:00'
  PRIMARY KEY (`id_agenda`),
  KEY `start_day` (`start_day`),
  KEY `id_customer` (`id_customer`),
  KEY `id_car` (`id_car`),
  KEY `id_user` (`id_user`),
  KEY `id_owner` (`id_owner`),
  KEY `type` (`type`),
  KEY `id_agency` (`id_agency`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 ;

2) Secondary table

CREATE TABLE IF NOT EXISTS `agenda_cars` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `id_agenda` int(11) NOT NULL,
  `id_car` int(11) NOT NULL,
  `id_owner` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `id_agenda` (`id_agenda`),
  KEY `id_car` (`id_car`),
  KEY `id_owner` (`id_owner`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1

Query:

SELECT a.id_agenda
FROM agenda as a
LEFT JOIN agenda_cars as agc on agc.id_agenda = a.id_agenda
WHERE 
(a.id_customer = '22'  OR (a.id_owner = '22' OR agc.id_owner = '22' ))
GROUP BY a.id_agenda
ORDER BY a.start_day,  a.start_hour

Explain:

id  select_type table   type    possible_keys  key         key_len    ref   rows    Extra   
1   SIMPLE       a      index   PRIMARY        PRIMARY      4          NULL 1051987 Using temporary; Using filesort
1   SIMPLE       agc    ref     id_agenda      id_agenda    4   db.a.id_agenda  1   Using where

The query reachs 10 secs to end, with the id 22, but with other id can reach also 20 secs, this just for the query, to load all in the web page take of course more time.

I don't get the point why it takes so long to get the data, i think the indexes are right configured and the query is pretty simple, so why?

Too much data?

I've solved in this way:

SELECT a.id_agenda
  FROM
      (
              SELECT id_agenda
              FROM agenda  
              WHERE (id_customer = '22'  OR  id_owner = '22' )
          UNION
              SELECT id_agenda
              FROM  agenda_cars
              WHERE id_owner = '22'
      )  as at
INNER JOIN agenda as a on a.id_agenda = at.id_agenda
GROUP BY a.id_agenda
ORDER BY a.start_day,  a.start_hour

This version of the query is ten times faster the then previous...but why?

Thanks to all want to contribute to solve my doubts!

UPDATE AFTER Rick James solution:

Query suggested

SELECT  a.id_agenda
    FROM  
    (
        SELECT  id_agenda  FROM  agenda  WHERE  id_customer = '22'
        UNION DISTINCT
        SELECT  id_agenda  FROM  agenda  WHERE  id_owner = '22'
        UNION DISTINCT
        SELECT  id_agenda  FROM  agenda_cars  WHERE  id_owner = '22'
    ) as at
    INNER JOIN  agenda as a  ON a.id_agenda = at.id_agenda
    ORDER BY  a.start_datetime;

Result: 279 total, 0.0111 sec

EXPLAIN:

id      select_type     table           type    possible_keys   key         key_len     ref             rows        Extra
1       PRIMARY         <derived2>      ALL     NULL            NULL        NULL        NULL            366         Using temporary; Using filesort
1       PRIMARY         a               eq_ref  PRIMARY         PRIMARY     4           at.id_agenda    1           NULL
2       DERIVED         agenda          ref     id_customer     id_customer 5           const           1           Using index
3       UNION           agenda          ref     id_owner        id_owner    5           const           114         Using index
4       UNION           agenda_cars     ref     id_owner        id_owner    4           const           250         NULL
NULL    UNION RESULT    <union2,3,4>    ALL     NULL            NULL        NULL        NULL            NULL        Using temporary

Upvotes: 1

Views: 146

Answers (1)

Rick James
Rick James

Reputation: 142278

Before I dig into what can be done, let me list several reg flags I see.

  • OR is hard to optimize
  • Filtering (WHERE) on multiple tables JOINed together is hard to optimize.
  • GROUP BY x ORDER BY z means two passes over the data, usually 2 temp tables and filesorts.
  • Did you really mean LEFT? It says "the right table (agc) might be missing, in which case provide NULLs".

(You may not be able to get rid of all of the red flags.)

Red flags in the Schema:

  • Indexing every column -- usually not useful
  • Only single-column indexes -- "composite" indexes often help.
  • DATE and TIME as separate columns -- usually makes for clumsy queries.

OK, those are off my shoulder, now to study the query... (Oh, and thanks for providing the CREATEs and EXPLAIN!)

The ON implies a 1:many relationship between agenda:agenda_cars. Is that correct?

id_owner and id_car are in both tables, yet are not included in the ON; what's up?

(Here's the meat of the answer to your final question.) Why have GROUP BY? I see no aggregates. I will guess that the 1:many relationship lead to multiple rows, and you needed to de-dup? For dedupping, please use DISTINCT. But, the real solution is to avoid the "inflate (JOIN) - deflate (GROUP BY)" syndrome. Your subquery is a good start on that.

Rolling some of the above comments in, plus more:

SELECT  a.id_agenda
    FROM  
    (
        SELECT  id_agenda  FROM  agenda  WHERE  id_customer = '22'
        UNION DISTINCT
        SELECT  id_agenda  FROM  agenda  WHERE  id_owner = '22'
        UNION DISTINCT
        SELECT  id_agenda  FROM  agenda_cars  WHERE  id_owner = '22'
    ) as at
    INNER JOIN  agenda as a  ON a.id_agenda = at.id_agenda
    ORDER BY  a.start_datetime;

Notes:

  • Got rid of the other OR
  • Explicit UNION DISTINCT to be clear that dups are expected.
  • Toss GROUP BY and not using SELECT DISTINCT; UNION DISTINCT deals with the need.
  • You have the 4 necessary indexes (one per subquery): (id_customer), (id_owner) (on both tables) and PRIMARY KEY(id_agenda).
  • The indexes are "covering indexes for all the subqueries -- an extra bonus.
  • There will be one unavoidable tmp table and file sort -- for the ORDER BY, but it won't be on a million rows.
  • (No need for composite indexes -- this time.)
  • I changed to a DATETIME; change back if you have a good reason for splitting them.

Did I get you another 10x? Did I explain it sufficiently?

Oh, one more thing...

This query returns an list of ids ordered by something that it does not return (date+time). What will you do with ids? If you are using this as a subquery in another table, then the Optimizer has a right to throw away the ORDER BY. Just warning you.

Upvotes: 3

Related Questions