Mana
Mana

Reputation: 95

Trying to understand why GROUP BY and HAVING yields different output when compared to WHERE

I have two queries that attempt to return all origin cities with flights less than 3 hours (180 minutes)

SELECT DISTINCT F.origin_city AS city, MAX(F.actual_time) AS max_time
FROM FLIGHTS F
GROUP BY F.origin_city
HAVING MAX(F.actual_time) < 180
ORDER BY F.origin_city


SELECT DISTINCT F.origin_city AS city, MAX(F.actual_time) AS max_time
FROM FLIGHTS F
WHERE F.actual_time < 180
GROUP BY F.origin_city
ORDER BY F.origin_city

For the first query, the first 10 rows (total of 109 rows returned) it returns are:

city max_time
Aberdeen SD 106
Abilene TX 111
Alpena MI 80
Ashland WV 84
Augusta GA 176
Barrow AK 118
Beaumont/Port Arthur TX 102
Bemidji MN 104
Bethel AK 95
Binghamton NY 132

The second query's first 10 rows (total of 325 rows returned) are:

city max_time
Aberdeen SD 106
Abilene TX 111
Adak Island AK 165
Aguadilla PR 177
Akron OH 179
Albany GA 111
Albany NY 179
Albuquerque NM 179
Alexandria LA 179
Allentown/Bethlehem/Easton PA 177

The first query is filtering for the condition before grouping, whereas the second query is filtering for the condition after grouping. However, why does the first query yield fewer rows than the second query's output?

Upvotes: 1

Views: 119

Answers (2)

ekochergin
ekochergin

Reputation: 4129

The first query shows the cities having MAX(actual_time) < 180

HAVING MAX(F.actual_time) < 180

whereas second query shows just the cities with actual time < 180

WHERE F.actual_time < 180

Let's say there is a following data in your table some_city: 90 some_city: 200 some_city: 300

The first query will get the max value (which is 300) and won't show it in the output. Meanwhile the second query will output the line "some_city: 90" as it satistfies the where-condition

Upvotes: 3

T. Peter
T. Peter

Reputation: 887

WHERE and HAVING filter rows in different timing:

  1. WHERE kicks in before GROUP

  2. HAVING kicks in after GROUP

for example lets have this following data :

|colA|colB|
|----|----|
|   A| 178|
|   A| 179|
|   A| 180|
|   A| 181|

in your query with WHERE, it will filter everything <180 which leave us :

|colA|colB|
|----|----|
|   A| 178|
|   A| 179|

and then start group and continue processing.

but in query with HAVING, it start to group first, then find the max :

|colA|colB|
|----|----|
|   A| 178|
|   A| 179|
|   A| 180|
|   A| 181|

find the max(colB):

|colA|max(colB)|
|----|---------|
|   A|      181|

and seem like the max(colB) is bigger than 180 which take the whole group of A entries out of final result.

Upvotes: 2

Related Questions