Reputation: 33
The following query takes approximately 12 seconds to execute. I have tried optimizing but was not able to. The table to be joined is pretty large (> 8.000.000 records).
SELECT
p0_.id AS id_0,
p0_.ean AS ean_1,
p0_.brand AS brand_2,
p0_.type AS type_3,
p0_.retail_price AS retail_price_4,
p0_.target_price AS target_price_5,
min(NULLIF(c1_.delivery_price, 0)) AS sclr_6,
COALESCE(((p0_.target_price - min(NULLIF(c1_.delivery_price, 0))) / p0_.target_price * -100), 0) AS sclr_7
FROM product p0_
LEFT JOIN crawl c1_ ON (
c1_.product_ean = p0_.ean AND (
c1_.crawl_date = p0_.last_crawl_date OR
p0_.last_crawl_date IS NULL
)
AND c1_.source_id IN (
SELECT o2_.source_id AS sclr_8
FROM organisation_source o2_
WHERE o2_.organisation_id = 5
)
)
WHERE p0_.organisation_id = 5 GROUP BY p0_.ean
I already tried writing the query in a lot of different ways, but unfortunately did not give me any performance win. If I remove the subquery in the last AND it does not help either.
See below the output of the EXPLAIN statement:
+------+--------------+-------+------+---------------------------------------------------+------------------+---------+------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+-------+------+---------------------------------------------------+------------------+---------+------------------------+--------+-------------+
| 1 | PRIMARY | p0_ | ref | uniqueConstraint,IDX_D34A04AD9E6B1585 | uniqueConstraint | 5 | const | 69 | Using where |
| 1 | PRIMARY | c1_ | ref | IDX_product_ean,IDX_crawl_date | IDX_product_ean | 62 | admin_pricev-p.p0_.ean | 468459 | Using where |
| 2 | MATERIALIZED | o2_ | ref | PRIMARY,IDX_DD91A56E9E6B1585,IDX_DD91A56E953C1C61 | PRIMARY | 4 | const | 1 | Using index |
+------+--------------+-------+------+---------------------------------------------------+------------------+---------+------------------------+--------+-------------+
See below the CREATE TABLE statements of the product and crawl tabel:
CREATE TABLE `product` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`organisation_id` int(11) DEFAULT NULL,
`ean` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
`brand` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`type` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`retail_price` decimal(10,2) NOT NULL,
`target_price` decimal(10,2) NOT NULL,
`last_crawl_date` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `uniqueConstraint` (`organisation_id`,`ean`),
KEY `IDX_D34A04AD9E6B1585` (`organisation_id`),
KEY `IDX_target_price` (`target_price`),
KEY `IDX_ean` (`ean`),
KEY `IDX_type` (`type`),
KEY `IDX_last_crawl_date` (`last_crawl_date`),
CONSTRAINT `FK_D34A04AD9E6B1585` FOREIGN KEY (`organisation_id`) REFERENCES `organisation` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=927 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE `crawl` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`source_id` int(11) DEFAULT NULL,
`store_id` int(11) DEFAULT NULL,
`product_ean` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
`crawl_date` datetime NOT NULL,
`takeaway_price` decimal(10,2) DEFAULT NULL,
`delivery_price` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `IDX_B4E9F1C2953C1C61` (`source_id`),
KEY `IDX_B4E9F1C2B092A811` (`store_id`),
KEY `IDX_product_ean` (`product_ean`),
KEY `IDX_takeaway_price` (`takeaway_price`),
KEY `IDX_crawl_date` (`crawl_date`),
CONSTRAINT `FK_B4E9F1C2953C1C61` FOREIGN KEY (`source_id`) REFERENCES `source` (`id`),
CONSTRAINT `FK_B4E9F1C2B092A811` FOREIGN KEY (`store_id`) REFERENCES `store` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=8606874 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Anyone has any idea how to improve the performance of this query? Many thanks! If more information is needed please let me know!
Upvotes: 3
Views: 769
Reputation: 10216
Try with composite indexes on your LEFT JOIN
s
CREATE TABLE `product` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`organisation_id` int(11) DEFAULT NULL,
`ean` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
`brand` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`type` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`retail_price` decimal(10,2) NOT NULL,
`target_price` decimal(10,2) NOT NULL,
`last_crawl_date` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `uniqueConstraint` (`organisation_id`,`ean`),
KEY `IDX_D34A04AD9E6B1585` (`organisation_id`),
KEY `IDX_target_price` (`target_price`),
KEY `IDX_ean` (`ean`),
KEY `IDX_type` (`type`),
KEY `IDX_last_crawl_date` (`last_crawl_date`),
INDEX `IDX_testing1` (`ean`,`last_crawl_date`),
CONSTRAINT `FK_D34A04AD9E6B1585` FOREIGN KEY (`organisation_id`) REFERENCES `organisation` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=927 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE `crawl` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`source_id` int(11) DEFAULT NULL,
`store_id` int(11) DEFAULT NULL,
`product_ean` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
`crawl_date` datetime NOT NULL,
`takeaway_price` decimal(10,2) DEFAULT NULL,
`delivery_price` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `IDX_B4E9F1C2953C1C61` (`source_id`),
KEY `IDX_B4E9F1C2B092A811` (`store_id`),
KEY `IDX_product_ean` (`product_ean`),
KEY `IDX_takeaway_price` (`takeaway_price`),
KEY `IDX_crawl_date` (`crawl_date`),
INDEX `IDX_testing2` ( `source_id`,`product_ean`,`crawl_date`),
CONSTRAINT `FK_B4E9F1C2953C1C61` FOREIGN KEY (`source_id`) REFERENCES `source` (`id`),
CONSTRAINT `FK_B4E9F1C2B092A811` FOREIGN KEY (`store_id`) REFERENCES `store` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=8606874 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Upvotes: 1
Reputation: 1269973
You can probably simplify the query to:
SELECT . . .
FROM product p0_ LEFT JOIN
crawl c1_
ON c1_.product_ean = p0_.ean AND
c1_.crawl_date = p0_.last_crawl_date AND
EXISTS (SELECT 1
FROM organisation_source o2_
WHERE o2_.organisation_id = 5 AND c1_.source_id = o2_.source_id
)
WHERE p0_.organisation_id = 5
GROUP BY p0_.ean;
The p0_.last_crawl_date IS NULL
is presumably unnecessary. A LEFT JOIN
will keep all rows in the first table even when there is a NULL
in a comparison. Your logic matches all rows in the second table (that meet the other conditions). That may be what you want, but I am guessing not.
In MySQL, exists
is sometimes faster than in
, which is why I've rewritten that portion.
For this query, you can speed it up using indexes: product(organisation_id, ean, last_crawl_date)
, crawl(product_ean, crawl_date, source_id)
and organisation_source(source_id, organisation_id)
.
Upvotes: 2