Reputation: 311
I have a table with 300k rows. The table a quite heavy so it make slow every query. After trying a lot of index and other optimization I decided to create partitions on the table.
Now I have 3 version of table
Now I am running same query on every table one by one and comparing timings
SELECT eu.event_id
FROM e_update eu
INNER JOIN event e ON e.id=eu.event_id
WHERE eu.start_date > 2010-10-15
AND e.published=1
AND eu.event_id > 25000
AND eu.event_id < 50000;
Time taken - 189911 rows in set, 2 warnings (14.43 sec)
SELECT eu.event_id
FROM e_update_partition eu
INNER JOIN event e ON e.id=eu.event_id
WHERE eu.start_date > 2010-10-15
AND e.published=1
AND eu.event_id > 25000
AND eu.event_id < 50000;
Time taken - 189911 rows in set, 2 warnings (15.87 sec)
Explain result-
+----+-------------+-------+-----------------------------------------------------------------------+-------+--------------------------------+-----------+---------+--------------------+--------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-----------------------------------------------------------------------+-------+--------------------------------+-----------+---------+--------------------+--------+-----------------------+ | 1 | SIMPLE | e | NULL | range | PRIMARY,published | published | 6 | NULL | 120674 | Using index condition | | 1 | SIMPLE | eu | p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19 | ref | event_id,start_date,event_id_2 | event_id | 4 | biztradeshows.e.id | 1 | Using where | +----+-------------+-------+-----------------------------------------------------------------------+-------+--------------------------------+-----------+---------+--------------------+--------+-----------------------+
SELECT eu.event_id
FROM e_update_partition_event eu
INNER JOIN event e ON e.id=eu.event_id
WHERE eu.start_date > 2010-10-15
AND e.published=1
AND eu.event_id > 25000
AND eu.event_id < 50000;
Time taken - 189911 rows in set, 2 warnings (20.56 sec)
Explain result-
+----+-------------+-------+----------------------------------+--------+--------------------------------+-----------+---------+--------------------+--------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+----------------------------------+--------+--------------------------------+-----------+---------+--------------------+--------+-----------------------+ | 1 | SIMPLE | e | NULL | range | PRIMARY,published | published | 6 | NULL | 120674 | Using index condition | | 1 | SIMPLE | eu | p3,p4,p5,p6,p7,p8,p9,p10,p11,p12 | eq_ref | event_id,start_date,event_id_2 | event_id | 4 | biztradeshows.e.id | 1 | Using where | +----+-------------+-------+----------------------------------+--------+--------------------------------+-----------+---------+--------------------+--------+-----------------------+
Partition schema for 3rd query
(PARTITION p1 VALUES LESS THAN (25000) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (50000) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (75000) ENGINE = InnoDB,
PARTITION p4 VALUES LESS THAN (100000) ENGINE = InnoDB,
PARTITION p5 VALUES LESS THAN (125000) ENGINE = InnoDB,
PARTITION p6 VALUES LESS THAN (150000) ENGINE = InnoDB,
PARTITION p7 VALUES LESS THAN (175000) ENGINE = InnoDB,
PARTITION p8 VALUES LESS THAN (200000) ENGINE = InnoDB,
PARTITION p9 VALUES LESS THAN (225000) ENGINE = InnoDB,
PARTITION p10 VALUES LESS THAN (250000) ENGINE = InnoDB,
PARTITION p11 VALUES LESS THAN (275000) ENGINE = InnoDB,
PARTITION p12 VALUES LESS THAN (300000) ENGINE = InnoDB)
Why my 3rd query taking more time than other two Queries and using almost all the partitions ?
Upvotes: 1
Views: 2844
Reputation: 142208
BY HASH is useless
Having event_id > ...
and BY HASH(event_id)
is a totally useless combination. The hashing does not know which values will be in which partition, except one by one. So, it simply assumes all partitions will be needed.
Then, it has to open each partition, perform the lookup, usually not find any desirable values there, then move on to the next partition. Hence it takes more time with PARITIIONing
than without. Even with no index on event_id
, the non-partitioned version would probably be slightly faster. With INDEX(event_id)
, the non-partition version is likely to be a lot faster.
I have yet to find any use case wherein BY HASH
provides any performance benefit.
Non-partitioned option 1
For the one query you presented, my first guess is to not partition, but I would have
INDEX(start_date),
INDEX(event_id)
The Optimizer would look at its meager statistics and pick between them.
Non-partitioned option 2
Again, assuming that query, my second guess would be this "covering" index:
INDEX(start_date, event_id)
A tip on partitioning: Don't even consider it for tables smaller than a million rows.
More discussion.
2-D Partitioning
That query is essentially a 2-dimensional problem because of two "ranges". But to make partitioning useful, you must use BY RANGE
, not BY HASH
. So, partition by either
BY RANGE(TO_DAYS(start_date)) together with
PRIMARY KEY(event_id, ..., start_date)
or
BY RANGE(event_id) together with
PRIMARY KEY(start_date, ..., event_id)
Be sure to use InnoDB to take advantage of its clustering on the PK. (My link, above, discusses some of the issues with the moving time as a partition key.)
Upvotes: 2
Reputation: 53734
No amount of partitioning is going to help you with this one:
e.published=1
boolean fields cannot be indexed effectively. Why? because they have only one of two values. This looks like a mutable field (one that you do updates on, since published will probably be flipped on and off). Such a field cannot be used in partitioning either.
Your first option is to combine this published
field with another field and create a composite index and hope that it has sufficient cardinality to be a usefull index.
Your second option is to create an archive table and move unpublished items out to the archive table.
BTW, your query has a condition that doesn't make much sense:
and eu.event_id >25000 and eu.event_id>50000;
This can be shortened to
and eu.event_id > 50000;
Why are all the partitions being queried? Well your first partition scheme is on a hash partitioning
Partitioning by HASH is used primarily to ensure an even distribution of data among a predetermined number of partitions.
So your data is in all the partitions
The second scheme, if you look closely you will find that two of the partitions are not being used. And those are the partitions left out by your where clause.
So the problem is in your Where clause :-)
Upvotes: 4