Matt
Matt

Reputation: 21

Optimising a query that uses index merge by intersection

I have a MySQL 8 database table accounts that has the following columns:

EDIT: See bottom for complete table structure.

Now, imagine the following SQL query:

SELECT
    COUNT(`id`) AS AGGREGATE
FROM
    `accounts`
WHERE
    `city_id` = 1
AND 
    `country_id` = 7
AND 
    `age` = 3

At 1 million records, this query becomes slow (~200ms).

When running EXPLAIN, I receive the following output:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE accounts NULL index_merge accounts_city_id_foreign accounts_country_id_foreign accounts_age_index accounts_city_id_foreign accounts_country_id_foreign accounts_age_index 9,2,9 NULL 15542 100.00 Using intersect(accounts_city_id_foreign, accounts_country_id_foreign, accounts_age_index); Using where; Using index

Given that MySQL appears to be using the indexes, I'm not sure what I can do to bring the execution time down. Does anyone have any ideas?

EDIT: In the future, the table will include more columns that will make it impossible to use a composite index as it will exceed the 16 column limit.

EDIT: Here's the complete table structure:

CREATE TABLE `accounts` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `city_id` bigint unsigned DEFAULT NULL,
  `school_id` bigint unsigned DEFAULT NULL,
  `country_id` bigint unsigned DEFAULT NULL,
  `province_id` bigint unsigned DEFAULT NULL,
  `age` tinyint unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `accounts_city_id_foreign` (`city_id`),
  KEY `accounts_school_id_foreign` (`school_id`),
  KEY `accounts_country_id_foreign` (`country_id`),
  KEY `accounts_province_id_foreign` (`province_id`),
  KEY `accounts_age_index` (`age`),
  CONSTRAINT `accounts_city_id_foreign` FOREIGN KEY (`city_id`) REFERENCES `cities` (`id`) ON DELETE SET NULL,
  CONSTRAINT `accounts_country_id_foreign` FOREIGN KEY (`country_id`) REFERENCES `countries` (`id`) ON DELETE SET NULL,
  CONSTRAINT `accounts_province_id_foreign` FOREIGN KEY (`province_id`) REFERENCES `provinces` (`id`) ON DELETE SET NULL,
  CONSTRAINT `accounts_school_id_foreign` FOREIGN KEY (`school_id`) REFERENCES `schools` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB AUTO_INCREMENT=1000002 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Upvotes: 1

Views: 644

Answers (3)

Bill Karwin
Bill Karwin

Reputation: 562881

You asked what will bring the query time down, and using a composite index will do that. Searching a single composite index is faster than searching several single-column indexes and performing an intersection merge on the results.

You commented that you will be adding more columns in the future, and there will eventually be more than 16 columns.

You don't have to add ALL the columns to the composite index!

Index design is not magic. It follows rules. You will create indexes designed to support specific queries that you need to run. You don't add add columns to an index unless they help the given query. You may have multiple composite indexes in the table, created to help different queries.

You might like my presentation How to Design Indexes, Really (or the video).

Re your comment:

I won't know every possible query combination ahead of time.

Yes, that's true. You can only create indexes for queries that you know. Other queries will not be optimized. If you need to optimize queries in the future, you might need to add new indexes to support them.

In my experience, this happens regularly, and I address this in the presentation. You will review your queries from time to time, because of course your application code changes and the queries you need change. You may add new indexes, or replace an index with a different index, or drop indexes that are no longer needed.

Upvotes: 0

DRapp
DRapp

Reputation: 48179

Indexes are to help your querying. So as suggested by Marko and agreed by others, having an index on (city_id, country_id, age) should significantly help. Now, yes, you will add other columns to the table, but are you trying to filter on 16+ criteria??? I doubt it. And of the queries you would be running, even if you have multiple composite indexes to help optimize those queries, how many columns might you need at any single time? 4, 5, 6? After that, I mean how granular do you plan on getting with your data. Country, State/Province, City, Town, Village, Neighborhood, Street, House? and by the time you are that low in the data, you would be at the page level data anyhow, wouldn't you?

So, your query of Country = 7, that already chops off a ton of stuff. Then to a given city within that country? Great, now you are at a finite level.

if you are going do be doing queries against large data that requires any aggregations, and the data is rather fixed from a historical perspective, maybe having pre-aggregated tables by some common elements might help long term.

FEEDBACK

The performance of querying is not necessarily where you will be hit, it would be in the inserts, updates, deletes as whatever may change has to update all the indexes on the table - single or composite. If you are getting more than 5 columns in an index, ask yourself, really??? How granular is it that you need for the index to be optimized. Querying out the data should be very fast with proper indexes. Updating indexes is also quick, but if you are dealing with millions of inserts in a month, quarter, year? The user doing theirs may have a slight delay ( 1/4 second?) but adding up a million seconds starts to get delay. But again, over what period of time would insert/update/delete be done anyhow.

Upvotes: 1

Marko
Marko

Reputation: 1000

Try creating a composite index on all three columns, e.g. CREATE INDEX idx_city_country_age ON table (city_id, country_id, age)

Upvotes: 3

Related Questions