Edouard Berthe
Edouard Berthe

Reputation: 1463

SQL Condition on Window function

I want to do a special request on my database (PostgreSQL v9.4.5), but I don't manage to do it.

In order to simply, let's say I have the following table AvgTemperatures, representing different averages of temperature taken in different cities, and calculated on different length of time (counted in months) :

 id |   city    |  avg | months 
----+-----------+------+--------
  1 |  New-York |   20 |     3   <--- average temperate over the last 3 months
  2 |  New-York |   19 |     6   <--- average temperate over the last 6 months
  3 |  New-York |   15 |    12   <--- etc
  4 |  New-York |   15 |    24
  5 |    Boston |   13 |     3
  6 |    Boston |   18 |     8
  7 |    Boston |   17 |    12
  8 |    Boston |   16 |    15
  9 |   Chicago |   12 |     2
 10 |   Chicago |   14 |    12
 11 |     Miami |   28 |     1
 12 |     Miami |   25 |     4
 13 |     Miami |   21 |    12
 14 |     Miami |   22 |    15
 15 |     Miami |   20 |    24

Now, imagine that I want to select all the rows concerning the measures in a city where at least one average has been over 19 degrees. In this case I want :

 id |   city    |  avg | months 
----+-----------+------+--------
  1 |  New-York |   20 |     3  
  2 |  New-York |   19 |     6  
  3 |  New-York |   15 |    12  
  4 |  New-York |   15 |    24  
 11 |     Miami |   28 |     1  
 12 |     Miami |   25 |     4  
 13 |     Miami |   21 |    12  
 14 |     Miami |   22 |    15  
 15 |     Miami |   20 |    24  

I could do something like :

 SELECT *
 FROM AvgTemperatures
 WHERE MIN(avg) OVER (PARTITION BY city) > 16

But :

********** Erreur **********

ERROR: window functions not allowed in WHERE clause

What's more, I cannot use GROUP BY as in :

 SELECT *
 FROM AvtTemperatures
 GROUP BY city
 HAVING MIN(avg) > 16

because I will lose information due to the aggregation (by the way this query is not valid because of the "SELECT *").

I'm pretty sure I can use the OVER PARTITION BY to solve that, but I don't know how. Does someone have an idea ?

Upvotes: 6

Views: 36261

Answers (5)

Lukasz Szozda
Lukasz Szozda

Reputation: 176189

All-at-once operation:

"All-at-Once Operations" means that all expressions in the same logical query process phase are evaluated logically at the same time.

And great chapter Impact on Window Functions:

Suppose you have:

CREATE TABLE Test ( Id INT) ;
 
INSERT  INTO Test VALUES  ( 1001 ), ( 1002 ) ;

SELECT Id
FROM Test
WHERE Id = 1002
  AND ROW_NUMBER() OVER(ORDER BY Id) = 1;

All-at-Once operations tell us these two conditions evaluated logically at the same point of time. Therefore, SQL Server can evaluate conditions in WHERE clause in arbitrary order, based on estimated execution plan. So the main question here is which condition evaluates first.

Case 1:

If ( Id = 1002 ) is first, then if ( ROW_NUMBER() OVER(ORDER BY Id) = 1 )

Result: 1002

Case 2:

If ( ROW_NUMBER() OVER(ORDER BY Id) = 1 ), then check if ( Id = 1002 )

Result: empty

So we have a paradox.

This example shows why we cannot use Window Functions in WHERE clause. You can think more about this and find why Window Functions are allowed to be used just in SELECT and ORDER BY clauses!

To get what you want you can wrap windowed function with CTE/subquery as in Gordon answer:

;WITH cte AS
(
  SELECT t.*, MAX(AVG) OVER (PARTITION BY city) AS average
  FROM avgTemperatures t
)
SELECT *
FROM cte
where average > 19
ORDER BY id;

db<>fiddle demo

Output:

╔═════╦══════════╦═════╦═════════╗
║ id  ║   city   ║ avg ║ months  ║
╠═════╬══════════╬═════╬═════════╣
║   1 ║ New-York ║  20 ║     3   ║
║   2 ║ New-York ║  19 ║     6   ║
║   3 ║ New-York ║  15 ║    12   ║
║   4 ║ New-York ║  15 ║    24   ║
║  11 ║ Miami    ║  28 ║     1   ║
║  12 ║ Miami    ║  25 ║     4   ║
║  13 ║ Miami    ║  21 ║    12   ║
║  14 ║ Miami    ║  22 ║    15   ║
║  15 ║ Miami    ║  20 ║    24   ║
╚═════╩══════════╩═════╩═════════╝

Upvotes: 9

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125474

The simplest solution is to use the bool_or aggregate function

select id, city, avg, months
from avttemperatures
where city in (
    select city
    from avttemperatures
    group by 1
    having bool_or(avg > 19)
)
order by  2, 4
;
 id |   city   | avg | months 
----+----------+-----+--------
 11 | Miami    |  28 |      1
 12 | Miami    |  25 |      4
 13 | Miami    |  21 |     12
 14 | Miami    |  22 |     15
 15 | Miami    |  20 |     24
  1 | New-York |  20 |      3
  2 | New-York |  19 |      6
  3 | New-York |  15 |     12
  4 | New-York |  15 |     24

The test table:

create table avttemperatures (
    id int, city text, avg int, months int
);
insert into avttemperatures (id, city, avg, months) values
(  1,'New-York',20,3),
(  2,'New-York',19,6),
(  3,'New-York',15,12),
(  4,'New-York',15,24),
(  5,'Boston',13,3),
(  6,'Boston',18,8),
(  7,'Boston',17,12),
(  8,'Boston',16,15),
(  9,'Chicago',12,2),
( 10,'Chicago',14,12),
( 11,'Miami',28,1),
( 12,'Miami',25,4),
( 13,'Miami',21,12),
( 14,'Miami',22,15),
( 15,'Miami',20,24);

Upvotes: 4

joop
joop

Reputation: 4523

No need to aggregate if you only want to know if at least one exists:

SELECT id, city, avg, months
FROM avgtemperatures t
WHERE EXISTS ( SELECT 42
    FROM avgtemperatures x
    WHERE x.city = t.city
    AND x.avg > 19
    )
ORDER BY city,months DESC
   ;

Note: avg is a bad name for a column.

Upvotes: 3

Gordon Linoff
Gordon Linoff

Reputation: 1271013

Use a subquery to get the maximum and then a where:

select t.*
from (select t.*, max(avg) over (partition by city) as maxavg
      from avgTemperatures t
     ) t
where maxavg > 19;

An alternative is to do this in the where clause:

select t.*
from avgTemperatures t
where t.city in (select t2.city from avgTemperatures t2 where t2.avg > 19);

Upvotes: 1

user330315
user330315

Reputation:

You need to wrap this in a derived table to be able to use in the where clause:

select *
from (
  SELECT t.*, MIN(avg) OVER (PARTITION BY city) as city_avg
  FROM AvgTemperatures t
) x
WHERE city_avg > 16

Upvotes: 1

Related Questions