Sunil Kumar
Sunil Kumar

Reputation: 311

Mysql query with partition taking more time than without partitioning

I have a table with 300k rows. The table a quite heavy so it make slow every query. After trying a lot of index and other optimization I decided to create partitions on the table.

Now I have 3 version of table

  1. e_update
  2. e_update_partition(20 partition using HASH(on event_id))
  3. e_update_partition_event(12 partition with range 25K entries in every partition (on event_id) )

Now I am running same query on every table one by one and comparing timings

SELECT eu.event_id
FROM e_update eu
INNER JOIN event e ON e.id=eu.event_id
WHERE eu.start_date > 2010-10-15
  AND e.published=1
  AND eu.event_id > 25000
  AND eu.event_id < 50000;

Time taken - 189911 rows in set, 2 warnings (14.43 sec)

SELECT eu.event_id
FROM e_update_partition eu
INNER JOIN event e ON e.id=eu.event_id
WHERE eu.start_date > 2010-10-15
  AND e.published=1
  AND eu.event_id > 25000
  AND eu.event_id < 50000;

Time taken - 189911 rows in set, 2 warnings (15.87 sec)

Explain result-

+----+-------------+-------+-----------------------------------------------------------------------+-------+--------------------------------+-----------+---------+--------------------+--------+-----------------------+
| id | select_type | table | partitions                                                            | type  | possible_keys                  | key       | key_len | ref                | rows   | Extra                 |
+----+-------------+-------+-----------------------------------------------------------------------+-------+--------------------------------+-----------+---------+--------------------+--------+-----------------------+
|  1 | SIMPLE      | e     | NULL                                                                  | range | PRIMARY,published              | published | 6       | NULL               | 120674 | Using index condition |
|  1 | SIMPLE      | eu    | p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19 | ref   | event_id,start_date,event_id_2 | event_id  | 4       | biztradeshows.e.id |      1 | Using where           |
+----+-------------+-------+-----------------------------------------------------------------------+-------+--------------------------------+-----------+---------+--------------------+--------+-----------------------+

SELECT eu.event_id
FROM e_update_partition_event eu
INNER JOIN event e ON e.id=eu.event_id
WHERE eu.start_date > 2010-10-15
  AND e.published=1
  AND eu.event_id > 25000
  AND eu.event_id < 50000;

Time taken - 189911 rows in set, 2 warnings (20.56 sec)

Explain result-

+----+-------------+-------+----------------------------------+--------+--------------------------------+-----------+---------+--------------------+--------+-----------------------+
| id | select_type | table | partitions                       | type   | possible_keys                  | key       | key_len | ref                | rows   | Extra                 |
+----+-------------+-------+----------------------------------+--------+--------------------------------+-----------+---------+--------------------+--------+-----------------------+
|  1 | SIMPLE      | e     | NULL                             | range  | PRIMARY,published              | published | 6       | NULL               | 120674 | Using index condition |
|  1 | SIMPLE      | eu    | p3,p4,p5,p6,p7,p8,p9,p10,p11,p12 | eq_ref | event_id,start_date,event_id_2 | event_id  | 4       | biztradeshows.e.id |      1 | Using where           |
+----+-------------+-------+----------------------------------+--------+--------------------------------+-----------+---------+--------------------+--------+-----------------------+

Partition schema for 3rd query

(PARTITION p1 VALUES LESS THAN (25000) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (50000) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (75000) ENGINE = InnoDB,
PARTITION p4 VALUES LESS THAN (100000) ENGINE = InnoDB,
PARTITION p5 VALUES LESS THAN (125000) ENGINE = InnoDB,
PARTITION p6 VALUES LESS THAN (150000) ENGINE = InnoDB,
PARTITION p7 VALUES LESS THAN (175000) ENGINE = InnoDB,
PARTITION p8 VALUES LESS THAN (200000) ENGINE = InnoDB,
PARTITION p9 VALUES LESS THAN (225000) ENGINE = InnoDB,
PARTITION p10 VALUES LESS THAN (250000) ENGINE = InnoDB,
PARTITION p11 VALUES LESS THAN (275000) ENGINE = InnoDB,
PARTITION p12 VALUES LESS THAN (300000) ENGINE = InnoDB)

Why my 3rd query taking more time than other two Queries and using almost all the partitions ?

Upvotes: 1

Views: 2844

Answers (2)

Rick James
Rick James

Reputation: 142208

BY HASH is useless

Having event_id > ... and BY HASH(event_id) is a totally useless combination. The hashing does not know which values will be in which partition, except one by one. So, it simply assumes all partitions will be needed.

Then, it has to open each partition, perform the lookup, usually not find any desirable values there, then move on to the next partition. Hence it takes more time with PARITIIONing than without. Even with no index on event_id, the non-partitioned version would probably be slightly faster. With INDEX(event_id), the non-partition version is likely to be a lot faster.

I have yet to find any use case wherein BY HASH provides any performance benefit.

Non-partitioned option 1

For the one query you presented, my first guess is to not partition, but I would have

INDEX(start_date),
INDEX(event_id)

The Optimizer would look at its meager statistics and pick between them.

Non-partitioned option 2

Again, assuming that query, my second guess would be this "covering" index:

INDEX(start_date, event_id)

A tip on partitioning: Don't even consider it for tables smaller than a million rows.

More discussion.

2-D Partitioning

That query is essentially a 2-dimensional problem because of two "ranges". But to make partitioning useful, you must use BY RANGE, not BY HASH. So, partition by either

 BY RANGE(TO_DAYS(start_date))  together with
 PRIMARY KEY(event_id, ..., start_date)

or

 BY RANGE(event_id)  together with
 PRIMARY KEY(start_date, ..., event_id)

Be sure to use InnoDB to take advantage of its clustering on the PK. (My link, above, discusses some of the issues with the moving time as a partition key.)

Upvotes: 2

e4c5
e4c5

Reputation: 53734

No amount of partitioning is going to help you with this one:

e.published=1 

boolean fields cannot be indexed effectively. Why? because they have only one of two values. This looks like a mutable field (one that you do updates on, since published will probably be flipped on and off). Such a field cannot be used in partitioning either.

Your first option is to combine this published field with another field and create a composite index and hope that it has sufficient cardinality to be a usefull index.

Your second option is to create an archive table and move unpublished items out to the archive table.

BTW, your query has a condition that doesn't make much sense:

 and eu.event_id >25000 and eu.event_id>50000;

This can be shortened to

 and eu.event_id > 50000;

Update

Why are all the partitions being queried? Well your first partition scheme is on a hash partitioning

Partitioning by HASH is used primarily to ensure an even distribution of data among a predetermined number of partitions.

So your data is in all the partitions

The second scheme, if you look closely you will find that two of the partitions are not being used. And those are the partitions left out by your where clause.

So the problem is in your Where clause :-)

Upvotes: 4

Related Questions