Jaylen
Jaylen

Reputation: 40291

How to improve sub query performance in MySQL

I have a CRM system that generates attributes using an EAV model. The problem as you may be very aware that EAV model require a complex queries to pull the data. Each attribute have to be returned in a separate column.

When using sub queries, MySQL performance sucks. I have to find a better way to write my queries by analyzing them using the giving where clause, sort order and the limit "if any"!

By sub query I am refereeing to a query that look like this

SELECT a.account_name, a.account_type, a.status, a.account_id, s.fieldValue, s2.last_training_on, s3.fieldValue
FROM accounts AS a
INNER JOIN clients AS c ON c.client_id = a.client_id
LEFT JOIN (
    SELECT p.related_to AS account_id, decimal_value AS fieldValue
    FROM df_answers_text AS p
    INNER JOIN df_field_to_client_relation AS r ON r.field_id = p.field_id
    WHERE p.field_id = '19' AND r.client_id = '7';
) AS s ON s.account_id = a.account_id
LEFT JOIN (
    SELECT p.related_to AS account_id, datetime_value AS last_training_on
    FROM df_answers_text AS p
    INNER JOIN df_field_to_client_relation AS r ON r.field_id = p.field_id
    WHERE p.field_id = '10' AND r.client_id = '7';
) AS s2 ON s2.account_id = a.account_id
LEFT JOIN (
    SELECT
      p.related_to
    , CAST(GROUP_CONCAT(o.label SEPARATOR " | ") AS CHAR(255)) AS fieldValue
    FROM df_answer_predefined AS p
    INNER JOIN df_fields_options AS o ON o.option_id = p.option_id
    INNER JOIN df_field_to_client_relation AS r ON r.field_id = o.field_id
    WHERE o.is_place_holder = 0 AND o.field_id = '16' AND r.field_id = '16' AND r.client_id = '7'
    GROUP BY p.related_to;
) AS s3 ON s3.related_to = a.account_id
WHERE c.client_id = '7' AND c.status = 'Active' AND ( a.account_type = 'TEST' OR a.account_type = 'VALUE' OR s2.last_training_on > '2015-01-01 00:00:00') AND (s.fieldValue = 'Medium' OR s.fieldValue = 'Low' OR a.expType = 'Very High')
ORDER BY a.account_name
LIMIT 500;

I thought about creating a temporary table using MEMORY engine with the content of the sub query like this

CREATE TEMPORARY TABLE s (KEY(account_id, fieldValue)) ENGINE = MEMORY
SELECT p.related_to AS account_id, decimal_value AS fieldValue
FROM df_answers_text AS p
INNER JOIN df_field_to_client_relation AS r ON r.field_id = p.field_id
WHERE p.field_id = '19' AND r.client_id = '7';

CREATE TEMPORARY TABLE s2 (KEY(account_id, INDEX USING BTREE last_training_on)) ENGINE = MEMORY
SELECT p.related_to AS account_id, datetime_value AS last_training_on
FROM df_answers_text AS p
INNER JOIN df_field_to_client_relation AS r ON r.field_id = p.field_id
WHERE p.field_id = '10' AND r.client_id = '7';


    CREATE TEMPORARY TABLE s3 (KEY(related_to, fieldValue)) ENGINE = MEMORY
    SELECT
      p.related_to
    , CAST(GROUP_CONCAT(o.label SEPARATOR " | ") AS CHAR(255)) AS fieldValue
    FROM df_answer_predefined AS p
    INNER JOIN df_fields_options AS o ON o.option_id = p.option_id
    INNER JOIN df_field_to_client_relation AS r ON r.field_id = o.field_id
    WHERE o.is_place_holder = 0 AND o.field_id = '16' AND r.field_id = '16' AND r.client_id = '7'
    GROUP BY p.related_to;


    CREATE TEMPORARY TABLE s3 (KEY(related_to)) ENGINE = MEMORY
    SELECT
      p.related_to
    , CAST(GROUP_CONCAT(o.label SEPARATOR " | ") AS CHAR(255)) AS fieldValue
    FROM df_answer_predefined AS p
    INNER JOIN df_fields_options AS o ON o.option_id = p.option_id
    INNER JOIN df_field_to_client_relation AS r ON r.field_id = o.field_id
    WHERE o.is_place_holder = 0 AND o.field_id = '16' AND r.field_id = '16' AND r.client_id = '7'
    GROUP BY p.related_to;


Then my new query will look like this

    SELECT a.account_name, a.account_type, a.status, a.account_id, s.fieldValue, s2.last_training_on, s3.fieldValue
    FROM accounts AS a
    INNER JOIN clients AS c ON c.client_id = a.client_id
    LEFT JOIN s ON s.account_id = a.account_id
    LEFT JOIN s2 ON s2.account_id = a.account_id
    LEFT JOIN s3 ON s2.related_to = a.account_id
    WHERE c.client_id = '7' AND c.status = 'Active' AND ( a.account_type = 'TEST' OR a.account_type = 'VALUE' OR s2.last_training_on > '2015-01-01 00:00:00') AND (s.fieldValue = 'Medium' OR s.fieldValue = 'Low' OR a.expType = 'Very High')
    ORDER BY a.account_name
    LIMIT 500;

    DROP TEMPORARY TABLE s, s2;

The problem that I am facing now of that the temporary table will create a temporary table of the entire data available in the database which consume lots of time. but my outer query is only looking for 500 records sorted by the a.account_name. If the temporary table has 1 million records that will be waste of time and obviously give me bad performance.

I am looking to find a better way to pass on the clause to the sub query so that way I would only create a temporary table with the needed data for the outer query

Note: these queries are generated dynamic using a GUI. I am unable to figure out how to extract the logic/clause and properly pass them to the sub query.

QUESTIONS

EDITED Here are my table definitions

CREATE TABLE df_answer_predefined ( answer_id int(11) unsigned NOT NULL AUTO_INCREMENT, field_id int(11) unsigned DEFAULT NULL, related_to int(11) unsigned DEFAULT NULL, option_id int(11) unsigned DEFAULT NULL, created_by int(11) unsigned NOT NULL, created_on datetime DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (answer_id), UNIQUE KEY un_row (field_id,option_id,related_to), KEY field_id (field_id), KEY related_to (related_to), KEY to_delete (field_id,related_to), KEY outter_view (field_id,option_id,related_to) ) ENGINE=InnoDB AUTO_INCREMENT=4946214 DEFAULT CHARSET=utf8;

 `CREATE TABLE df_fields_options (
  option_id int(11) unsigned NOT NULL AUTO_INCREMENT,
  field_id int(11) unsigned NOT NULL,
  label varchar(255) DEFAULT NULL,
  is_place_holder tinyint(1) NOT NULL DEFAULT '0',
  is_default tinyint(1) NOT NULL DEFAULT '0',
  sort smallint(3) NOT NULL DEFAULT '1',
  status tinyint(1) NOT NULL DEFAULT '1',
  PRIMARY KEY (option_id),
  KEY i (field_id),
  KEY d (option_id,field_id,is_place_holder)
) ENGINE=InnoDB AUTO_INCREMENT=155 DEFAULT CHARSET=utf8;`


`CREATE TABLE df_field_to_client_relation (
  relation_id int(11) unsigned NOT NULL AUTO_INCREMENT,
  client_id int(11) unsigned DEFAULT NULL,
  field_id int(11) unsigned DEFAULT NULL,
  PRIMARY KEY (relation_id),
  UNIQUE KEY unique_row (field_id,client_id),
  KEY client_id (client_id),
  KEY flient_id (field_id)
) ENGINE=InnoDB AUTO_INCREMENT=26 DEFAULT CHARSET=utf8;`


`CREATE TABLE df_answers_text (
  answer_id int(11) unsigned NOT NULL AUTO_INCREMENT,
  notes varchar(20000) DEFAULT NULL,
  datetime_value datetime DEFAULT NULL,
  date_value date DEFAULT NULL,
  us_phone_number char(10) DEFAULT NULL,
  field_id int(11) unsigned DEFAULT NULL,
  related_to int(11) unsigned DEFAULT NULL,
  created_by int(11) unsigned NOT NULL,
  created_on datetime DEFAULT CURRENT_TIMESTAMP,
  modified_by int(11) DEFAULT NULL,
  modified_on datetime DEFAULT NULL,
  big_unsigned_value bigint(20) DEFAULT NULL,
  big_signed_value bigint(19) DEFAULT NULL,
  unsigned_value int(11) DEFAULT NULL,
  signed_value int(10) DEFAULT NULL,
  decimal_value decimal(18,4) DEFAULT NULL,
  PRIMARY KEY (answer_id),
  UNIQUE KEY unique_answer (field_id,related_to),
  KEY field_id (field_id),
  KEY related_to (related_to),
  KEY big_unsigned_value (big_unsigned_value),
  KEY big_signed_value (big_signed_value),
  KEY unsigned_value (unsigned_value),
  KEY signed_value (signed_value),
  KEY decimal_Value (decimal_value)
) ENGINE=InnoDB AUTO_INCREMENT=2458748 DEFAULT CHARSET=utf8;`

The query that takes the most time is the third sub query with the alias s3

Here is the execution plan for the query that us taking long time "2 seconds"

enter image description here

Upvotes: 0

Views: 924

Answers (1)

Rick James
Rick James

Reputation: 142298

UNIQUE(a,b,c)
INDEX (a)

DROP the INDEX, since the UNIQUE key is an INDEX and the INDEX is a prefix of the UNIQUE.

PRIMARY KEY(d)
UNIQUE(a,b,c)

Why have d at all? Simply say PRIMARY KEY(a,b,c).

FROM ( SELECT ... )
JOIN ( SELECT ... ) ON ...

optimizes poorly (until 5.6.6). Whenever possible turn JOIN ( SELECT ) into a JOIN with the table. As you suggested, using tmp tables may be better, if you can add a suitable index to the tmp table. Best is to try to avoid more than one "table" that is a subquery.

In a many-to-many relation table, don't include an id for the table, instead have only

PRIMARY KEY (a,b),  -- for enforcing uniqueness, providing a PK, and going one direction
INDEX       (b,a)   -- for going the other way.

The EXPLAIN does not seem to match the SELECT you provided. Each is useless without the other.

Another approach that might help... Instead of

SELECT ..., s2.foo, ...
    ...
    JOIN ( SELECT ... FROM x WHERE ... ) AS s2 ON s2.account_id = a.account_id

see if you can reformulate it as:

SELECT ..., 
       ( SELECT foo FROM x WHERE ... AND related = a.account_id) AS foo, ...
    ...

That is, replace the JOIN subquery with a correlated subquery for the one value you need.

The bottom line is that the EAV model sucks.

Hmmm... I don't see the need for this at all, since r is not used elsewhere in he query...

INNER JOIN df_field_to_client_relation AS r ON r.field_id = p.field_id
    WHERE p.field_id = '19' AND r.client_id = '7'

It seems to be equivalent to

WHERE EXISTS ( SELECT * FROM df_field_to_client_relation 
               WHERE field_id = '19' AND client_id = '7' )

but why bother checking for existence?

Upvotes: 1

Related Questions