Reputation: 4721
I've got a database where each entry is an edge with a source tag, a relationship and a weight. I want to perform a query where given a source tag, I get the top n edges by weight with that source tag per relationship.
For example, given the entries
Id Source Relationship End Weight
-----------------------------------------
1 cat isA feline 56
2 cat isA animal 12
3 cat isA pet 37
4 cat desires food 5
5 cat desires play 88
6 dog isA canine 72
If I queried using "cat" as a source and n=2, the result should be
Id Source Relationship End Weight
-----------------------------------------
1 cat isA feline 56
3 cat isA pet 37
4 cat desires food 5
5 cat desires play 88
I've tried several different approaches based on other questions.
The most sucessful so far is based on How to SELECT the newest four items per category?
SELECT *
FROM tablename t1
JOIN tablename t2 ON (t1.relationship = t2.relationship)
LEFT OUTER JOIN tablename t3
ON (t1.relationship = t3.relationship AND t2.weight < t3.weight)
WHERE t1.source = "cat"
AND t3.relationship IS NULL
ORDER BY t2.weight DESC;
However, this returns all the edges with source="cat" in sorted order. If I try to add LIMIT, I get the edges with the top weights not by group.
The other thing that I have tried is
SELECT *
FROM tablename t1
WHERE t1.source="cat"
AND (
SELECT COUNT(*)
FROM tablename t2
WHERE t1.relationship = t2.relationship
AND t1.weight <= t2.weight
) <= 2;
This returns
Id Source Relationship End Weight
-----------------------------------------
1 cat isA feline 56
4 cat desires food 5
5 cat desires play 88
Because edge 6 has a higher weight for the isA relationship than edge 2, but is excluded from the results because the source="dog"
I am very new to databases, so if I need to take a completely different approach, let me know. I'm not afraid of starting over.
Upvotes: 2
Views: 263
Reputation: 562280
Doing this with the correlated subquery is indeed inefficient, because MySQL has to run the subquery for every row of the outer query, just to decide if the row in the outer query meets the conditions. That's a lot of overhead.
Here's a method using no subquery:
SELECT t1.*
FROM tablename t1
JOIN tablename t2 ON t1.source = t2.source and t1.relationship = t2.relationship
AND t1.weight <= t2.weight
WHERE t1.source = 'cat'
GROUP BY t1.id
HAVING COUNT(*) <= 2;
And here's a method using neither subquery, nor joins/group by:
SELECT *
FROM (
SELECT tablename.*, IF(@r = relationship, @n:=@n+1, @n:=1) AS _n,
@r:=relationship AS _r
FROM (SELECT @r:=null, @n:=1) _init, tablename
WHERE source = 'cat'
ORDER BY relationship, weight DESC
) AS _t
WHERE _n <= 2;
These solutions also need some tiebreaker in case there are multiple rows with the same top weights. But that applies to all the solutions.
The simpler solution, which wouldn't require special gymnastics or tiebreakers, is to use SQL window functions like ROW_NUMBER() OVER (PARTITION BY relationship)
, but MySQL does not support these.
Upvotes: 2
Reputation: 49049
It won't be too efficient, but MySQL allows you to do something like this:
SELECT t1.*
FROM
tablename t1 INNER JOIN (
SELECT SUBSTRING_INDEX(
GROUP_CONCAT(Id ORDER BY Weight DESC),
',',
2) top_2
FROM tablename
WHERE Source='cat'
GROUP BY Relationship) t2
ON FIND_IN_SET(t1.id, t2.top_2);
Please see fiddle here.
Upvotes: 0