Reputation: 512
This feels like a bit of a beginner SQL question to me, but here goes. This is what I'm trying to do:
I've described the tables involved at the bottom of this post, but here's the query that I'm performing
SELECT p.*, GROUP_CONCAT(pt.name)
FROM products p
LEFT JOIN product_tags_for_products pt4p ON (pt4p.product_id = p.id)
LEFT JOIN product_tags pt ON (pt.id = pt4p.product_tag_id)
GROUP BY p.id
ORDER BY p.created
LIMIT 30;
There are about 280,000 products, 130 tags, 524,000 linking records and I've ANALYZEd the tables. The problem is that it's taking over 80s to run (on decent hardware), which feels wrong to me.
Here's the EXPLAIN results:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE p index NULL created 4 NULL 30 "Using temporary"
1 SIMPLE pt4p ref idx_product_tags_for_products idx_product_tags_for_products 3 s.id 1 "Using index"
1 SIMPLE pt eq_ref PRIMARY PRIMARY 4 pt4p.product_tag_id 1
I think it's doing things in the wrong order, i.e. ORDERing the results after the join, using a large temporary table, and then LIMITing. The query plan in my head would go something like this:
This sounds simple, but it doesn't seem to work like that - am I missing something?
CREATE TABLE `products` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`rating` float NOT NULL,
`created` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`last_modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`active` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `created` (`created`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE `product_tags_for_products` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`product_id` mediumint(8) unsigned NOT NULL,
`product_tag_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `idx_product_tags_for_products` (`product_id`,`product_tag_id`),
KEY `product_tag_id` (`product_tag_id`),
CONSTRAINT `product_tags_for_products_ibfk_1` FOREIGN KEY (`product_id`) REFERENCES `products` (`id`),
CONSTRAINT `product_tags_for_products_ibfk_2` FOREIGN KEY (`product_tag_id`) REFERENCES `product_tags` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE `product_tags` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Updated with profiling information at Salman A's request:
Status,
Duration,CPU_user,CPU_system,Context_voluntary,Context_involuntary,Block_ops_in,Block_ops_out,Messages_sent,Messages_received,Page_faults_major,Page_faults_minor,Swaps,Source_function,Source_file,Source_line
starting,
0.000124,0.000106,0.000015,0,0,0,0,0,0,0,0,0,NULL,NULL,NULL
"Opening tables",
0.000022,0.000020,0.000003,0,0,0,0,0,0,0,0,0,"unknown function",sql_base.cc,4519
"System lock",
0.000007,0.000004,0.000002,0,0,0,0,0,0,0,0,0,"unknown function",lock.cc,258
"Table lock",
0.000011,0.000009,0.000002,0,0,0,0,0,0,0,0,0,"unknown function",lock.cc,269
init,
0.000055,0.000054,0.000001,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,2524
optimizing,
0.000008,0.000006,0.000002,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,833
statistics,
0.000116,0.000051,0.000066,0,0,0,0,0,0,0,1,0,"unknown function",sql_select.cc,1024
preparing,
0.000027,0.000023,0.000003,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,1046
"Creating tmp table",
0.000054,0.000053,0.000002,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,1546
"Sorting for group",
0.000018,0.000015,0.000003,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,1596
executing,
0.000004,0.000002,0.000001,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,1780
"Copying to tmp table",
0.061716,0.049455,0.013560,0,18,0,0,0,0,0,3680,0,"unknown function",sql_select.cc,1927
"converting HEAP to MyISAM",
0.046731,0.006371,0.017543,3,5,0,3,0,0,0,32,0,"unknown function",sql_select.cc,10980
"Copying to tmp table on disk",
10.700166,3.038211,1.191086,538,1230,1,31,0,0,0,65,0,"unknown function",sql_select.cc,11045
"Sorting result",
0.777887,0.155327,0.618896,2,137,0,1,0,0,0,634,0,"unknown function",sql_select.cc,2201
"Sending data",
0.000336,0.000159,0.000178,0,0,0,0,0,0,0,1,0,"unknown function",sql_select.cc,2334
end,
0.000005,0.000003,0.000002,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,2570
"removing tmp table",
0.106382,0.000058,0.080105,4,9,0,11,0,0,0,0,0,"unknown function",sql_select.cc,10912
end,
0.000015,0.000007,0.000007,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,10937
"query end",
0.000004,0.000002,0.000001,0,0,0,0,0,0,0,0,0,"unknown function",sql_parse.cc,5083
"freeing items",
0.000012,0.000012,0.000001,0,0,0,0,0,0,0,0,0,"unknown function",sql_parse.cc,6107
"removing tmp table",
0.000010,0.000009,0.000001,0,0,0,0,0,0,0,0,0,"unknown function",sql_select.cc,10912
"freeing items",
0.000084,0.000022,0.000057,0,1,0,0,1,0,0,0,0,"unknown function",sql_select.cc,10937
"logging slow query",
0.000004,0.000001,0.000001,0,0,0,0,0,0,0,0,0,"unknown function",sql_parse.cc,1723
"logging slow query",
0.000049,0.000031,0.000018,0,0,0,0,0,0,0,0,0,"unknown function",sql_parse.cc,1733
"cleaning up",
0.000007,0.000005,0.000002,0,0,0,0,0,0,0,0,0,"unknown function",sql_parse.cc,1691
The tables are:
Products = 84.1MiB (there are extra fields in the products table which I omitted for clarity) Tags = 32KiB Linking table = 46.6MiB
Upvotes: 5
Views: 376
Reputation: 50328
Regarding filtering on tags (which you mentioned in the comments on Johan's answer), if the obvious
SELECT p.*, GROUP_CONCAT(pt.name) AS tags
FROM products p
JOIN product_tags_for_products pt4p2 ON (pt4p2.product_id = p.id)
JOIN product_tags pt2 ON (pt2.id = pt4p2.product_tag_id)
LEFT JOIN product_tags_for_products pt4p ON (pt4p.product_id = p.id)
LEFT JOIN product_tags pt ON (pt.id = pt4p.product_tag_id)
WHERE pt2.name IN ('some', 'tags', 'here')
GROUP BY p.id
ORDER BY p.created LIMIT 30
doesn't run fast enough, you could always try this:
CREATE TEMPORARY TABLE products30
SELECT p.*
FROM products p
JOIN product_tags_for_products pt4p ON (pt4p.product_id = p.id)
JOIN product_tags pt ON (pt.id = pt4p.product_tag_id)
WHERE pt.name IN ('some', 'tags', 'here')
GROUP BY p.id
ORDER BY p.created LIMIT 30
SELECT p.*, GROUP_CONCAT(pt.name) AS tags
FROM products30 p
LEFT JOIN product_tags_for_products pt4p ON (pt4p.product_id = p.id)
LEFT JOIN product_tags pt ON (pt.id = pt4p.product_tag_id)
GROUP BY p.id
ORDER BY p.created
(I used a temp table because you said "no subqueries"; I don't know if they're any easier to use in an Active Record framework, but at least it's another way to do it.)
Ps. One really off-the-wall idea about your original problem: would it make any difference if you changed the GROUP BY p.id
clause to GROUP BY p.created, p.id
? Probably not, but I'd at least try it.
Upvotes: 0
Reputation: 1
Ah - I see that none of the keys you GROUP BY on are BTREE, by default PRIMARY keys are hashes. It helps group by when there is an ordering index... otherwise it has to scan...
What I mean is, I think it would help significantly if you added a BTREE based index for p.id and p.created. In that case I think the engine will avoid having to scan/sort all those keys to execute group by and order by.
Upvotes: 0
Reputation: 76557
I would try limiting the number of products to 30 first and then joining with only 30 products:
SELECT p.*, GROUP_CONCAT(pt.name) as tags
FROM (SELECT p30.* FROM products p30 ORDER BY p30.created LIMIT 30) p
LEFT JOIN product_tags_for_products pt4p ON (pt4p.product_id = p.id)
LEFT JOIN product_tags pt ON (pt.id = pt4p.product_tag_id)
GROUP BY p.id
ORDER BY p.created
I know you said no subqueries but you did not explain why, and I don't see any other way to solve your issue.
Note that you can eliminate the subselect by putting that in a view:
CREATE VIEW v_last30products AS
SELECT p30.* FROM products p30 ORDER BY p30.created LIMIT 30;
Then the query is simplified to:
SELECT p.*, GROUP_CONCAT(pt.name) as tags
FROM v_last30products p
LEFT JOIN product_tags_for_products pt4p ON (pt4p.product_id = p.id)
LEFT JOIN product_tags pt ON (pt.id = pt4p.product_tag_id)
GROUP BY p.id
ORDER BY p.created
Other issue, your n-to-n
table product_tags_for_products
Does not make sense, I'd restructure it like so:
CREATE TABLE `product_tags_for_products` (
`product_id` mediumint(8) unsigned NOT NULL,
`product_tag_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`product_id`,`product_tag_id`),
CONSTRAINT `product_tags_for_products_ibfk_1` FOREIGN KEY (`product_id`) REFERENCES `products` (`id`),
CONSTRAINT `product_tags_for_products_ibfk_2` FOREIGN KEY (`product_tag_id`) REFERENCES `product_tags` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
This should make the query faster by:
- shortening the key used (On InnoDB the PK is always included in secondary keys);
- Allowing you to use the PK which should be faster than using a secondary key;
More speed issues
If you replace the select *
with only the fields that you need select p.title, p.rating, ... FROM
that will also speed up things a little.
Upvotes: 3