T_01
T_01

Reputation: 1359

Select all rows which match a condition dependent on over rows

Let's assume I have a table T with the columns X, Y, Z where Z is some comparable value (I have to make it work on a PostgreSQL database but a solution that additionally supports oracle would be even nicer).

What I want to achieve is best described by the following, sadly invalid query:

SELECT X, Y, MAX(Z) 
FROM T OVER (PARTITION BY Z) as max_z 
GROUP BY Y 
WHERE Z < z_threshold AND Z = max_z AND X = some_x

This is invalid since window function result cannot be accessed in a where clause as I understood from the (PostgreSQL) documentation.

In other words, for each group of rows with the same y-value I want to select the row which has the maximum Z in that group.

Currently my solution is to just select all rows where X = some_x and Z < z_threshold and then "post-process" the data outside of sql, which is not very efficient with large datasets.

Is it possible to achieve this result with plain sql or is there any other more efficient solution to this?

Upvotes: 1

Views: 1448

Answers (2)

Charlieface
Charlieface

Reputation: 71544

Row-numbering is usually the easiest and most efficient method:

SELECT t.*
FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY Y ORDER BY Z DESC) rn
    FROM T
) t
WHERE rn = 1

You can use DENSE_RANK instead of ROW_NUMBER if you want tied results

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269763

You just need a subquery:

SELECT t.*
FROM (SELECT X, Y, Z, MAX(Z) OVER (PARTITION BY Y) as max_z
      FROM T
     ) t
WHERE z = max_z;

If there are duplicates maximuma for a given y, this returns all the duplicates. Often ROW_NUMBER() would be used for this purpose, but you started with MAX() so this also works.

Note: In Postgres, I would recommend DISTINCT ON:

select distinct on (y) t.*
from t
order by y, z desc;

Upvotes: 1

Related Questions