Reputation:
I'll try to explain this better on another question. This is the query that I think should work but, off course, MySQL doesn't support this specific subselect query:
select *
from articles a
where a.article_id in
(select f.article_id
from articles f
where f.category_id = a.category_id
order by f.is_sticky, f.published_at
limit 3) /* limit isn't allowed inside a IN subquery */
What I'm trying to archive is this: In an articles table I have several articles for several categories. I need to obtain at most three articles per category (any number of categories).
Here's the data:
CREATE TABLE articles (
article_id int(10) unsigned NOT NULL AUTO_INCREMENT,
category_id int(10) unsigned NOT NULL,
title varchar(100) NOT NULL,
is_sticky boolean NOT NULL DEFAULT 0,
published_at datetime NOT NULL,
PRIMARY KEY (article_id)
);
INSERT INTO articles VALUES
(1, 1, 'foo', 0, '2009-02-06'),
(1, 1, 'bar', 0, '2009-02-07'),
(1, 1, 'baz', 0, '2009-02-08'),
(1, 1, 'qox', 1, '2009-02-09'),
(1, 2, 'foo', 0, '2009-02-06'),
(1, 2, 'bar', 0, '2009-02-07'),
(1, 2, 'baz', 0, '2009-02-08'),
(1, 2, 'qox', 1, '2009-02-09');
What I'm trying to retrieve is the following:
1, 1, qox, 1, 2009-02-09
1, 1, foo, 0, 2009-02-06
1, 1, bar, 0, 2009-02-07
1, 2, qox, 1, 2009-02-09
1, 2, foo, 0, 2009-02-06
1, 2, bar, 0, 2009-02-07
Notice how 'quox' jumped to the first place in it's category because it is sticky.
Can you figure out a way to avoid the LIMIT inside the subquery?
Thanks
Upvotes: 2
Views: 300
Reputation: 13211
Have a look at this code snippet called Within-group quotas (Top N per group).
Depending on the size of your set, there's two solutions proposed, one playing with a count and the other one using a temporary table for bigger tables.
So basically, if you have a big table, until MySQL implements a LIMIT in subqueries or something similar, you'll have to manually (well, or using a dynamic query in a loop) aggregate all your categories with one of the proposed solutions here.
// A solution using a temporary table and a stored procedure:
Run that once:
DELIMITER //
CREATE PROCEDURE top_articles()
BEGIN
DECLARE done INT DEFAULT 0;
DECLARE catid INT;
DECLARE cur1 CURSOR FOR SELECT DISTINCT(category_id) FROM articles;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN cur1;
# This temporary table will hold all top N article_id for each category
CREATE TEMPORARY TABLE top_articles (
article_id int(10) unsigned NOT NULL
);
# Loop through each category
REPEAT
FETCH cur1 INTO catid;
INSERT INTO top_articles
SELECT article_id FROM articles
WHERE category_id = catid
ORDER BY is_sticky DESC, published_at
LIMIT 3;
UNTIL done END REPEAT;
# Get all fields in correct order based on our temporary table
SELECT * FROM articles WHERE article_id
IN (SELECT article_id FROM top_articles)
ORDER BY category_id, is_sticky DESC, published_at;
# Remove our temporary table
DROP TEMPORARY TABLE top_articles;
END;
//
DELIMITER ;
And then, to try it out:
CALL top_articles();
You should see the results you were waiting for. And it should work for any number of articles per category, and any number of categories easily. This is what I'm getting:
+------------+-------------+-------+-----------+---------------------+
| article_id | category_id | title | is_sticky | published_at |
+------------+-------------+-------+-----------+---------------------+
| 5 | 1 | qox | 1 | 2009-02-09 00:00:00 |
| 1 | 1 | foo | 0 | 2009-02-06 00:00:00 |
| 2 | 1 | foo | 0 | 2009-02-06 00:00:00 |
| 9 | 2 | qox | 1 | 2009-02-09 00:00:00 |
| 6 | 2 | foo | 0 | 2009-02-06 00:00:00 |
| 7 | 2 | bar | 0 | 2009-02-07 00:00:00 |
+------------+-------------+-------+-----------+---------------------+
Although I don't know how it would translate performance-wise. It probably can be optimized and cleaned out a little bit.
Upvotes: 1
Reputation: 12070
This is a simplification of your solution
select *
from articles a
where a.article_id =
(select f.article_id
from articles f
where f.category_id = a.category_id
order by f.is_sticky, f.published_at
limit 1) or a.article_id =
(select f.article_id
from articles f
where f.category_id = a.category_id
order by f.is_sticky, f.published_at
limit 1, 1) or
a.article_id =
(select f.article_id
from articles f
where f.category_id = a.category_id
order by f.is_sticky, f.published_at
limit 2, 1)
Upvotes: 1
Reputation:
I've found a (horrible, horrible) workaround that I probably shouldn't even post but...
select *
from articles a
where a.article_id =
(select f.article_id
from articles f
where f.category_id = a.category_id
order by f.is_sticky, f.published_at
limit 1)
union
select *
from articles a
where a.article_id =
(select f.article_id
from articles f
where f.category_id = a.category_id
order by f.is_sticky, f.published_at
limit 1, 1)
union
select *
from articles a
where a.article_id =
(select f.article_id
from articles f
where f.category_id = a.category_id
order by f.is_sticky, f.published_at
limit 2, 1)
order by category_id
Since I only needed three articles per category I can repeat the query three times (instead of repeating it for each category) one for the first article in all categories, one for the second article and one for the third article in all categories and the join them all and order by the category.
It seems LIMIT isn't supported in combination with IN but works just fine to retrieve one record at a time.
If you have a better way I'm still interested in your solution.
Thanks
Upvotes: 0