Barry Chapman
Barry Chapman

Reputation: 6780

MySQL Select with several joins doing full table scan on joined table first

I have the following query:

SELECT 
  Impressions.id AS `Impressions__id`, 
  Impressions.timestamp AS `Impressions__timestamp`, 
  Impressions.name AS `Impressions__name`, 
  Impressions.lat AS `Impressions__lat`, 
  Impressions.lng AS `Impressions__lng`, 
  Impressions.personas_count AS `Impressions__personas_count`, 
  Impressions.modified AS `Impressions__modified`, 
  Beacons.id AS `Beacons__id`, 
  Beacons.uuid AS `Beacons__uuid`, 
  Beacons.major AS `Beacons__major`, 
  Beacons.minor_dec AS `Beacons__minor_dec`, 
  Beacons.minor_hex AS `Beacons__minor_hex`, 
  Beacons.impressions_count AS `Beacons__impressions_count`, 
  Beacons.created AS `Beacons__created`, 
  Beacons.modified AS `Beacons__modified`, 
  Zones.id AS `Zones__id`, 
  Zones.location_id AS `Zones__location_id`, 
  Zones.beacon_id AS `Zones__beacon_id`, 
  Zones.fixture_no AS `Zones__fixture_no`, 
  Zones.placement AS `Zones__placement`, 
  Zones.floor AS `Zones__floor`, 
  Zones.impressions_count AS `Zones__impressions_count`, 
  Zones.ignore_further_incidents AS `Zones__ignore_further_incidents`, 
  Zones.is_reviewed AS `Zones__is_reviewed`, 
  Zones.review_date AS `Zones__review_date`, 
  Zones.created AS `Zones__created`, 
  Zones.modified AS `Zones__modified`, 
  Locations.id AS `Locations__id`, 
  Locations.retailer_id AS `Locations__retailer_id`, 
  Locations.google_place_id AS `Locations__google_place_id`, 
  Locations.regional_name AS `Locations__regional_name`, 
  Locations.location AS `Locations__location`, 
  Locations.store_no AS `Locations__store_no`, 
  Locations.lat AS `Locations__lat`, 
  Locations.lng AS `Locations__lng`, 
  Locations.address1 AS `Locations__address1`, 
  Locations.address2 AS `Locations__address2`, 
  Locations.address3 AS `Locations__address3`, 
  Locations.city AS `Locations__city`, 
  Locations.state AS `Locations__state`, 
  Locations.postal_code AS `Locations__postal_code`, 
  Locations.region_id AS `Locations__region_id`, 
  Locations.country_id AS `Locations__country_id`, 
  Locations.zones_count AS `Locations__zones_count`, 
  Locations.contacts_count AS `Locations__contacts_count`, 
  Locations.created AS `Locations__created`, 
  Locations.modified AS `Locations__modified`, 
  Devices.id AS `Devices__id`, 
  Devices.os AS `Devices__os`, 
  Devices.bluetooth_enabled AS `Devices__bluetooth_enabled`, 
  Devices.impressions_count AS `Devices__impressions_count`, 
  Devices.modified AS `Devices__modified`, 
  Regions.id AS `Regions__id`, 
  Regions.country_name AS `Regions__country_name`, 
  Regions.subdiv AS `Regions__subdiv`, 
  Regions.subdiv_name AS `Regions__subdiv_name`, 
  Regions.level_name AS `Regions__level_name`, 
  Regions.alt_names AS `Regions__alt_names`, 
  Regions.subdiv_star AS `Regions__subdiv_star`, 
  Regions.subdiv_id AS `Regions__subdiv_id`, 
  Regions.country_id AS `Regions__country_id`, 
  Regions.country_code_2 AS `Regions__country_code_2`, 
  Regions.country_code_3 AS `Regions__country_code_3`, 
  Countries.id AS `Countries__id`, 
  Countries.country_name AS `Countries__country_name`, 
  Countries.alt_names AS `Countries__alt_names`, 
  Countries.code2 AS `Countries__code2`, 
  Countries.code3 AS `Countries__code3`, 
  Countries.iso_cc AS `Countries__iso_cc`, 
  Countries.fips_code AS `Countries__fips_code`, 
  Countries.fips_country_name AS `Countries__fips_country_name`, 
  Countries.un_region AS `Countries__un_region`, 
  Countries.un_subregion AS `Countries__un_subregion`, 
  Countries.comments AS `Countries__comments`, 
  Countries.created AS `Countries__created`, 
  Countries.modified AS `Countries__modified` 
FROM 
  impressions Impressions 
  inner join beacons Beacons ON Beacons.id = (Impressions.beacon_id) 
  inner JOIN zones Zones ON Zones.id = (Impressions.zone_id) 
  inner JOIN devices Devices ON Devices.id = (Impressions.device_id) 
  INNER JOIN locations Locations ON Locations.id = (Zones.location_id) 
  LEFT JOIN regions Regions ON Regions.id = (Locations.region_id) 
  LEFT JOIN countries Countries ON Countries.id = (Locations.country_id) 
ORDER BY 
  Impressions.timestamp desc 
LIMIT 
  15 OFFSET 15

This query takes about 6 seconds to run. The EXPLAIN output is as follows:

+----+-------------+-------------+--------+---------------------------------------+----------------+---------+---------------------------------+-------+---------------------------------+
| id | select_type | table       | type   | possible_keys                         | key            | key_len | ref                             | rows  | Extra                           |
+----+-------------+-------------+--------+---------------------------------------+----------------+---------+---------------------------------+-------+---------------------------------+
|  1 | SIMPLE      | Devices     | ALL    | PRIMARY                               | NULL           | NULL    | NULL                            | 43274 | Using temporary; Using filesort |
|  1 | SIMPLE      | Impressions | ref    | zone_idx,device_id_idx2,beacon_id_idx | device_id_idx2 | 8       | gen1_d2go.Devices.id            |     3 | NULL                            |
|  1 | SIMPLE      | Zones       | eq_ref | PRIMARY,fk_location_idx,comp          | PRIMARY        | 8       | gen1_d2go.Impressions.zone_id   |     1 | NULL                            |
|  1 | SIMPLE      | Beacons     | eq_ref | PRIMARY                               | PRIMARY        | 8       | gen1_d2go.Impressions.beacon_id |     1 | NULL                            |
|  1 | SIMPLE      | Locations   | eq_ref | PRIMARY                               | PRIMARY        | 8       | gen1_d2go.Zones.location_id     |     1 | NULL                            |
|  1 | SIMPLE      | Regions     | eq_ref | PRIMARY                               | PRIMARY        | 4       | gen1_d2go.Locations.region_id   |     1 | NULL                            |
|  1 | SIMPLE      | Countries   | eq_ref | PRIMARY                               | PRIMARY        | 4       | gen1_d2go.Locations.country_id  |     1 | NULL                            |
+----+-------------+-------------+--------+---------------------------------------+----------------+---------+---------------------------------+-------+---------------------------------+
7 rows in set (0.00 sec)

I do not understand why it is favoring the full scan of the Devices table. The tables are all indexed, and the CREATE statements for Impressions and Devices is as follows:

Impressions

CREATE TABLE `impressions` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `device_id` bigint(20) unsigned NOT NULL,
  `beacon_id` bigint(20) unsigned NOT NULL,
  `zone_id` bigint(20) unsigned NOT NULL,
  `timestamp` datetime NOT NULL,
  `google_place_id` bigint(20) unsigned DEFAULT NULL,
  `name` varchar(60) DEFAULT NULL,
  `lat` decimal(12,7) DEFAULT NULL,
  `lng` decimal(12,7) DEFAULT NULL,
  `personas_count` int(10) unsigned DEFAULT '0',
  `created` datetime DEFAULT NULL,
  `modified` datetime DEFAULT NULL,
  PRIMARY KEY (`id`,`timestamp`),
  KEY `zone_idx` (`zone_id`),
  KEY `device_id_idx2` (`device_id`),
  KEY `beacon_id_idx` (`beacon_id`),
  KEY `timestamp_idx` (`id`,`timestamp`),
  KEY `ALL` (`id`,`timestamp`,`name`,`lat`,`lng`,`personas_count`,`modified`),
  CONSTRAINT `beacon_id` FOREIGN KEY (`beacon_id`) REFERENCES `beacons` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `device2` FOREIGN KEY (`device_id`) REFERENCES `devices` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `zone_FK` FOREIGN KEY (`zone_id`) REFERENCES `zones` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=303907 DEFAULT CHARSET=utf8;

DEVICES

CREATE TABLE `devices` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `device_id` bigint(20) unsigned NOT NULL,
  `advertiser_id` char(36) NOT NULL,
  `os` varchar(80) DEFAULT NULL,
  `bluetooth_enabled` tinyint(1) DEFAULT NULL,
  `impressions_count` int(10) unsigned DEFAULT '0',
  `created` datetime DEFAULT NULL,
  `modified` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `advertiser_idx` (`advertiser_id`),
  KEY `ad_dev` (`device_id`,`advertiser_id`),
  KEY `device_id` (`device_id`)
) ENGINE=InnoDB AUTO_INCREMENT=53628 DEFAULT CHARSET=utf8;

The kicker is:

When i use FORCE INDEX (timestamp_idx) after the FROM impressions, it works great. It uses that index, and runs in about 0.078s. I do not know why it is trying to avoid using that index, or doing a select from that table first at all.

UPDATED

Including EXPLAIN with FORCE INDEX

Current database: gen1_d2go

+----+-------------+-------------+--------+------------------------------+---------------+---------+---------------------------------+------+-------+
| id | select_type | table       | type   | possible_keys                | key           | key_len | ref                             | rows | Extra |
+----+-------------+-------------+--------+------------------------------+---------------+---------+---------------------------------+------+-------+
|  1 | SIMPLE      | Impressions | index  | NULL                         | timestamp_idx | 5       | NULL                            |   15 | NULL  |
|  1 | SIMPLE      | Zones       | eq_ref | PRIMARY,fk_location_idx,comp | PRIMARY       | 8       | gen1_d2go.Impressions.zone_id   |    1 | NULL  |
|  1 | SIMPLE      | Beacons     | eq_ref | PRIMARY                      | PRIMARY       | 8       | gen1_d2go.Impressions.beacon_id |    1 | NULL  |
|  1 | SIMPLE      | Locations   | eq_ref | PRIMARY                      | PRIMARY       | 8       | gen1_d2go.Zones.location_id     |    1 | NULL  |
|  1 | SIMPLE      | Regions     | eq_ref | PRIMARY                      | PRIMARY       | 4       | gen1_d2go.Locations.region_id   |    1 | NULL  |
|  1 | SIMPLE      | Countries   | eq_ref | PRIMARY                      | PRIMARY       | 4       | gen1_d2go.Locations.country_id  |    1 | NULL  |
|  1 | SIMPLE      | Devices     | eq_ref | PRIMARY                      | PRIMARY       | 8       | gen1_d2go.Impressions.device_id |    1 | NULL  |
+----+-------------+-------------+--------+------------------------------+---------------+---------+---------------------------------+------+-------+
7 rows in set (0.01 sec)

Upvotes: 2

Views: 1261

Answers (2)

Francis Manoj Fernnado
Francis Manoj Fernnado

Reputation: 262

regarding the full table scan you can't optimize it even more. refer here for more information https://cryptkcoding.com/blog/2012/04/06/how-to-optimize-mysql-join-queries-through-indexing/

Upvotes: 0

Rick James
Rick James

Reputation: 142433

Impressions needs an index starting with timestamp. That way, the optimizer will hopefully decide to scan through Impressions in timestamp order, thereby avoiding the sort, etc.

A side lesson... You have 3 indexes starting with id, timestamp. One is the PRIMARY KEY. That means that the other two are unnecessary.

An additional speedup can be had thus:

ALTER TABLE Impressions
    DROP INDEX timestamp_idx,  -- as already mentioned
    DROP INDEX ALL,            -- ditto
    DROP PRIMARY KEY,          -- to rearrange it
    ADD PRIMARY KEY(timestamp, id),  -- thus
    ADD INDEX(id);             -- and keep AUTO_INCREMENT happy

Why? By having the PK start with timestamp, the query can scan the data instead of hopping between some index and the data. This will speed up the query in question. Caveat: It may hurt other queries.

Other notes...

CHAR(36) smells like a UUID, correct? But with utf8, it takes exactly 108 bytes!. Change to CHAR(36) CHARACTER SET ascii NOT NULL so that it will take only 36 bytes. (Or you could convert to `BINARY(16) for more savings; but that is another story, and more code.)

Unless you have billions of rows, BIGINT (at 8 bytes) is overkill for ids. INT UNSIGNED is only 4 bytes.

Smaller translates into faster in various ways.

Upvotes: 1

Related Questions