Reputation: 7388
This is a slightly harder version of what I posted earlier. I don't want to edit the original to ruin a perfectly good answer to the question I had at the time.
Suppose a table named tau.
tau:
A | B | C
----------
2 - 1 - red
3 - 1 - rod
4 - 1 - rope
6 - 5 - red
7 - 5 - rap
8 - 5 - rod
9 - 10 -rod
11- 10 -road
12- 13 -rud
Column A is the primary key. It will be unique. Column B is a foreign key. In my database no integer key is ever the same across tables. Column C is not a key.
This table will have a lot of rows. There are many other columns, say column C for instance, that are indexed for search.
I want exactly M distinct values from column B. Lets say M = 2
Importantly, I am using column B in the expression (it's indexed too!) to determine which B's to return.
Normally, I would go
select distinct B From tau Where C like 'r_d' AND B < 13 Order By B Desc Limit 2
and I get
B
-----
10
5
This is the current state of affairs, now. But I want to upgrade to a new scenario:
When the expression on B C is satisfied and adds the record to the results pile (C like 'r_d' AND B < 13), I want to return column A as well as column B, while maintaining the restriction of 2 at most distinct column B values.
Another important condition is the solution must work for expressions in the where clause that return true for possibly different values of C.
Behold, the results I want.
A | B
------
9 - 10
8 - 5
6 - 5
The trouble is, the actual limit of records returned can be greater than M, or 2 in the example. I don't really care how many records come through, as long as there are only M distinct values of B. How would I write a query to accomplish this in Postgresql?
Upvotes: 4
Views: 77
Reputation: 32693
DENSE_RANK
should do what you need. Here is SQL Fiddle.
Put your M
into WHERE R.rnk <= 2
instead of 2
.
Rather than starting with M
distinct values from B
and trying to figure out how to add missing values from column A
to the final result, I'm thinking about this problem from the other side. We have some search conditions that limit the whole table to some intermediary result set:
SELECT *
FROM tau
WHERE C LIKE 'r_d' AND B < 13
This intermediary result set has all columns from the table, nothing is aggregated yet. We just need to filter this set further and leave only M
distinct values of the B
column. DENSE_RANK
assigns a number (without gaps) to each group of B
values, so we can use it in the final filter.
SELECT
A, B
FROM
(
SELECT
A
,B
,C
,DENSE_RANK() OVER(ORDER BY B DESC) AS rnk
FROM tau
WHERE C LIKE 'r_d' AND B < 13
) AS R
WHERE R.rnk <= 2
ORDER BY B DESC, A DESC;
result set
A B
9 10
8 5
6 5
Second variant using LATERAL JOIN
SQL Fiddle with both variants.
In the second variant we are finding M
distinct values of B
at first. Then for each found value we run the main query with extra filter for this B
value. If there are millions of different values of B
, then filtering to specific value should be efficient, provided there is an index on B
. If there are millions of rows, but number of total distinct values of B
is small, it should be slow.
WITH
CTE
AS
(
select distinct B
From tau
Where C like 'r_d' AND B < 13
Order By B Desc
Limit 2
)
SELECT
T.A, T.B, T.C
FROM
CTE
INNER JOIN LATERAL
(
SELECT tau.A, tau.B, tau.C
FROM tau
WHERE tau.B = CTE.B AND tau.C like 'r_d' AND tau.B < 13
) AS T ON true
ORDER BY T.B DESC, T.A DESC;
Upvotes: 3